📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Raza Imam, Hu Wang, Dwarikanath Mahapatra, Mohammad Yaqub
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In medical imaging, vision-language models face a critical duality:
pretrained networks offer broad robustness but lack subtle, modality-specific
characteristics, while fine-tuned expert models achieve high in-distribution
accuracy yet falter under modality shift. Existing model-merging techniques,
designed for natural-image benchmarks, are simple and efficient but fail to
deliver consistent gains across diverse medical modalities; their static
interpolation limits reliability in varied clinical...
Авторы:
Kenneth Yang, Wen-Li Wei, Jen-Chun Lin
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key strategy for
adapting large-scale pre-trained models to downstream tasks, but existing
approaches face notable limitations. Addition-based methods, such as Adapters
[1], introduce inference latency and engineering complexity, while
selection-based methods like Gradient-based Parameter Selection (GPS) [2]
require a full backward pass, which results in the same peak memory usage as
full fine-tuning. To address this dilemma, we propose Fee...
Авторы:
Wu Wei, Xiaomeng Fan, Yuwei Wu, Zhi Gao, Pengxiang Li, Yunde Jia, Mehrtash Harandi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Modality alignment is critical for vision-language models (VLMs) to
effectively integrate information across modalities. However, existing methods
extract hierarchical features from text while representing each image with a
single feature, leading to asymmetric and suboptimal alignment. To address
this, we propose Alignment across Trees, a method that constructs and aligns
tree-like hierarchical features for both image and text modalities.
Specifically, we introduce a semantic-aware visual featu...
Авторы:
Joyoni Dey, Hunter C. Meyer, Murtuza S. Taqi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Low-dose computed tomography (LDCT) is the current standard for lung cancer
screening, yet its adoption and accessibility remain limited. Many regions lack
LDCT infrastructure, and even among those screened, early-stage cancer
detection often yield false positives, as shown in the National Lung Screening
Trial (NLST) with a sensitivity of 93.8 percent and a false-positive rate of
26.6 percent. We aim to investigate whether X-ray dark-field imaging (DFI)
radiograph, a technique sensitive to small...
Авторы:
Jingjun Bi, Fadi Dornaika
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Recently, graph-based semi-supervised learning and pseudo-labeling have
gained attention due to their effectiveness in reducing the need for extensive
data annotations. Pseudo-labeling uses predictions from unlabeled data to
improve model training, while graph-based methods are characterized by
processing data represented as graphs. However, the lack of clear graph
structures in images combined with the complexity of multi-view data limits the
efficiency of traditional and existing techniques. M...
Авторы:
Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Vicky Kalogeiton, David Picard
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Current text-to-image generative models are trained on large uncurated
datasets to enable diverse generation capabilities. However, this does not
align well with user preferences. Recently, reward models have been
specifically designed to perform post-hoc selection of generated images and
align them to a reward, typically user preference. This discarding of
informative data together with the optimizing for a single reward tend to harm
diversity, semantic fidelity and efficiency. Instead of this ...
Авторы:
Pei Peng, MingKun Xie, Hang Hao, Tong Jin, ShengJun Huang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Object-context shortcuts remain a persistent challenge in vision-language
models, undermining zero-shot reliability when test-time scenes differ from
familiar training co-occurrences. We recast this issue as a causal inference
problem and ask: Would the prediction remain if the object appeared in a
different environment? To answer this at inference time, we estimate object and
background expectations within CLIP's representation space, and synthesize
counterfactual embeddings by recombining obje...
📄 CYPRESS: Crop Yield Prediction via Regression on Prithvi's Encoder for Satellite Sensing
2025-11-01Авторы:
Shayan Nejadshamsi, Yuanyuan Zhang, Shadi Zaki, Brock Porth, Lysa Porth, Vahab Khoshdel
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Accurate and timely crop yield prediction is crucial for global food security
and modern agricultural management. Traditional methods often lack the
scalability and granularity required for precision farming. This paper
introduces CYPRESS (Crop Yield Prediction via Regression on Prithvi's Encoder
for Satellite Sensing), a deep learning model designed for high-resolution,
intra-field canola yield prediction. CYPRESS leverages a pre-trained,
large-scale geospatial foundation model (Prithvi-EO-2.0-...
📄 SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models
2025-11-01Авторы:
Anushka Sivakumar, Andrew Zhang, Zaber Hakim, Chris Thomas
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This work introduces SteerVLM, a lightweight steering module designed to
guide Vision-Language Models (VLMs) towards outputs that better adhere to
desired instructions. Our approach learns from the latent embeddings of paired
prompts encoding target and converse behaviors to dynamically adjust
activations connecting the language modality with image context. This allows
for fine-grained, inference-time control over complex output semantics without
modifying model weights while preserving performa...
Авторы:
Valentyna Starodub, Mantas Lukoševičius
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Age-related macular degeneration (AMD) is one of the leading causes of
irreversible vision impairment in people over the age of 60. This research
focuses on semantic segmentation for AMD lesion detection in RGB fundus images,
a non-invasive and cost-effective imaging technique. The results of the ADAM
challenge - the most comprehensive AMD detection from RGB fundus images
research competition and open dataset to date - serve as a benchmark for our
evaluation. Taking the U-Net connectivity as a b...
Показано 221 -
230
из 835 записей