📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts

2025-11-27

Авторы:

Xing Wang, Huiyuan Xie, Yiyan Wang, Chaojun Xiao, Huimin Chen, Holli Sargeant, Felix Steffek, Jie Shao, Zhiyuan Liu, Maosong Sun

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) are now deployed at unprecedented scale, assisting millions of users in daily tasks. However, the risk of these models assisting unlawful activities remains underexplored. In this study, we define this high-risk behavior as complicit facilitation - the provision of guidance or support that enables illicit user instructions - and present four empirical studies that assess its prevalence in widely deployed LLMs. Using real-world legal cases and established legal framew...

ID: 2511.20736v1 cs.CY, cs.AI, cs.CL

arXiv PDF

📄 CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design

2025-11-27

Авторы:

Daeheon Jeong, Seoyeon Byun, Kihoon Son, Dae Hyun Kim, Juho Kim

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

User interface (UI) design is an iterative process in which designers progressively refine their work with design software such as Figma or Sketch. Recent advances in vision language models (VLMs) with tool invocation suggest these models can operate design software to edit a UI design through iteration. Understanding and enhancing this capacity is important, as it highlights VLMs' potential to collaborate with designers within conventional software. However, as no existing benchmark evaluates t...

ID: 2511.20737v1 cs.CV, cs.AI, cs.CL

arXiv PDF

📄 DesignPref: Capturing Personal Preferences in Visual Design Generation

2025-11-27

Авторы:

Yi-Hao Peng, Jeffrey P. Bigham, Jason Wu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generative models, such as large language models and text-to-image diffusion models, are increasingly used to create visual designs like user interfaces (UIs) and presentation slides. Finetuning and benchmarking these generative models have often relied on datasets of human-annotated design preferences. Yet, due to the subjective and highly personalized nature of visual design, preference varies widely among individuals. In this paper, we study this problem by introducing DesignPref, a dataset o...

ID: 2511.20513v1 cs.CV, cs.AI, cs.CL, cs.HC

arXiv PDF

📄 Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion

2025-11-27

Авторы:

Samuele Dell'Erba, Andrew D. Bagdanov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Diffusion models have established the state-of-the-art in text-to-image generation, but their performance often relies on a diffusion prior network to translate text embeddings into the visual manifold for easier decoding. These priors are computationally expensive and require extensive training on massive datasets. In this work, we challenge the necessity of a trained prior at all by employing Optimization-based Visual Inversion (OVI), a training-free and data-free alternative, to replace the n...

ID: 2511.20821v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries

2025-11-27

Авторы:

Sree Bhattacharyya, Yaman Kumar Singla, Sudhir Yarram, Somesh Kumar Singh, Harini S, James Z. Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Visual content memorability has intrigued the scientific community for decades, with applications ranging widely, from understanding nuanced aspects of human memory to enhancing content design. A significant challenge in progressing the field lies in the expensive process of collecting memorability annotations from humans. This limits the diversity and scalability of datasets for modeling visual content memorability. Most existing datasets are limited to collecting aggregate memorability scores ...

ID: 2511.20854v1 cs.CV, cs.AI, cs.CL

arXiv PDF

📄 ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction

2025-11-27

Авторы:

Qineng Wang, Wenlong Huang, Yu Zhou, Hang Yin, Tianwei Bao, Jianwen Lyu, Weiyu Liu, Ruohan Zhang, Jiajun Wu, Li Fei-Fei, Manling Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Embodied cognition argues that intelligence arises from sensorimotor interaction rather than passive observation. It raises an intriguing question: do modern vision-language models (VLMs), trained largely in a disembodied manner, exhibit signs of embodied cognition? We introduce ENACT, a benchmark that casts evaluation of embodied cognition as world modeling from egocentric interaction in a visual question answering (VQA) format. Framed as a partially observable Markov decision process (POMDP) w...

ID: 2511.20937v1 cs.AI, cs.CL, cs.CV, cs.RO

arXiv PDF

📄 Towards Audio Token Compression in Large Audio Language Models

2025-11-27

Авторы:

Saurabhchand Bhati, Samuel Thomas, Hilde Kuehne, Rogerio Feris, James Glass

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Audio Language Models (LALMs) demonstrate impressive performance across diverse tasks, ranging from speech recognition to general audio understanding. However, their scalability is limited by the quadratic complexity of attention and the high token rates of audio signals. These challenges make it difficult to extend LALMs to long-form audio and to deploy them on resource-constrained platforms such as edge devices. In this paper, we explore techniques such as unsupervised segmentation, un...

ID: 2511.20973v1 eess.AS, cs.AI, cs.CL

arXiv PDF

📄 TALES: A Taxonomy and Analysis of Cultural Representations in LLM-generated Stories

2025-11-27

Авторы:

Kirti Bhagat, Shaily Bhatt, Athul Velagapudi, Aditya Vashistha, Shachi Dave, Danish Pruthi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Millions of users across the globe turn to AI chatbots for their creative needs, inviting widespread interest in understanding how such chatbots represent diverse cultures. At the same time, evaluating cultural representations in open-ended tasks remains challenging and underexplored. In this work, we present TALES, an evaluation of cultural misrepresentations in LLM-generated stories for diverse Indian cultural identities. First, we develop TALES-Tax, a taxonomy of cultural misrepresentations b...

ID: 2511.21322v1 cs.HC, cs.AI, cs.CL, cs.CY

arXiv PDF

📄 Do Reasoning Vision-Language Models Inversely Scale in Test-Time Compute? A Distractor-centric Empirical Analysis

2025-11-27

Авторы:

Jiyun Bae, Hyunjong Ok, Sangwoo Mo, Jaeho Lee

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

How does irrelevant information (i.e., distractors) affect test-time scaling in vision-language models (VLMs)? Prior studies on language models have reported an inverse scaling effect, where textual distractors lead to longer but less effective reasoning. To investigate whether similar phenomena occur in multimodal settings, we introduce Idis (Images with distractors), a visual question-answering dataset that systematically varies distractors along semantic, numerical, and spatial dimensions. Ou...

ID: 2511.21397v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Subjective Depth and Timescale Transformers: Learning Where and When to Compute

2025-11-27

Авторы:

Frederico Wieser, Martin Benfeghoul, Haitham Bou Ammar, Jun Wang, Zafeirios Fountas

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The rigid, uniform allocation of computation in standard Transformer (TF) architectures can limit their efficiency and scalability, particularly for large-scale models and long sequences. Addressing this, we introduce Subjective Depth Transformers (SDT) and Subjective Timescale Transformers (STT), two distinct architectures that leverage Bayesian surprise signals to dynamically route computation, learning where and when to compute within decoder-only TFs. SDT augments a decoder-only stack with a...

ID: 2511.21408v1 cs.LG, cs.AI, cs.CL, cs.IT

arXiv PDF

Показано 71 - 80 из 1292 записей