📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Evolving Prompts for Toxicity Search in Large Language Models

2025-11-19

Авторы:

Onkar Shelar, Travis Desell

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models remain vulnerable to adversarial prompts that elicit toxic content even after safety alignment. We present ToxSearch, a black-box evolutionary framework that tests model safety by evolving prompts in a synchronous steady-state loop. The system employs a diverse set of operators, including lexical substitutions, negation, back-translation, paraphrasing, and two semantic crossover operators, while a moderation oracle provides fitness guidance. Operator-level analysis shows he...

ID: 2511.12487v1 cs.NE, cs.AI, cs.CL

arXiv PDF

📄 Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing

2025-11-19

Авторы:

Sanchaita Hazra, Doeun Lee, Bodhisattwa Prasad Majumder, Sachin Kumar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models have seen expanding application across domains, yet their effectiveness as assistive tools for scientific writing -- an endeavor requiring precision, multimodal synthesis, and domain expertise -- remains insufficiently understood. We examine the potential of LLMs to support domain experts in scientific writing, with a focus on abstract composition. We design an incentivized randomized controlled trial with a hypothetical conference setup where participants with relevant exp...

ID: 2511.12529v1 cs.HC, cs.AI, cs.CL

arXiv PDF

📄 Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments

2025-11-19

Авторы:

Samuel Nathanson, Rebecca Williams, Cynthia Matuszek

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) increasingly operate in multi-agent and safety-critical settings, raising open questions about how their vulnerabilities scale when models interact adversarially. This study examines whether larger models can systematically jailbreak smaller ones - eliciting harmful or restricted behavior despite alignment safeguards. Using standardized adversarial tasks from JailbreakBench, we simulate over 6,000 multi-turn attacker-target exchanges across major LLM families and sca...

ID: 2511.13788v1 cs.LG, cs.AI, cs.CL, cs.CR, cs.MA

arXiv PDF

📄 WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance

2025-11-19

Авторы:

Genglin Liu, Shijie Geng, Sha Li, Hejie Cui, Sarah Zhang, Xin Liu, Tianyi Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Multimodal LLM-powered agents have recently demonstrated impressive capabilities in web navigation, enabling agents to complete complex browsing tasks across diverse domains. However, current agents struggle with repetitive errors and lack the ability to learn from past experiences across sessions, limiting their long-term robustness and sample efficiency. We introduce WebCoach, a model-agnostic self-evolving framework that equips web browsing agents with persistent cross-session memory, enablin...

ID: 2511.12997v1 cs.AI, cs.CL

arXiv PDF

📄 PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics

2025-11-19

Авторы:

Sachin Vashistha, Aryan Bibhuti, Atharva Naik, Martin Tutek, Somak Aditya

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Real-world conversations are rich with pragmatic elements, such as entity mentions, references, and implicatures. Understanding such nuances is a requirement for successful natural communication, and often requires building a local world model which encodes such elements and captures the dynamics of their evolving states. However, it is not well-understood whether language models (LMs) construct or maintain a robust implicit representation of conversations. In this work, we evaluate the ability ...

ID: 2511.13021v1 cs.AI, cs.CL

arXiv PDF

📄 When AI Does Science: Evaluating the Autonomous AI Scientist KOSMOS in Radiation Biology

2025-11-19

Авторы:

Humza Nusrat, Omar Nusrat

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Agentic AI "scientists" now use language models to search the literature, run analyses, and generate hypotheses. We evaluate KOSMOS, an autonomous AI scientist, on three problems in radiation biology using simple random-gene null benchmarks. Hypothesis 1: baseline DNA damage response (DDR) capacity across cell lines predicts the p53 transcriptional response after irradiation (GSE30240). Hypothesis 2: baseline expression of OGT and CDO1 predicts the strength of repressed and induced radiation-res...

ID: 2511.13825v1 cs.AI, cs.CL

arXiv PDF

📄 STEP: Success-Rate-Aware Trajectory-Efficient Policy Optimization

2025-11-19

Авторы:

Yuhan Chen, Yuxuan Liu, Long Zhang, Pengzhi Gao, Jian Luan, Wei Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Multi-turn interaction remains challenging for online reinforcement learning. A common solution is trajectory-level optimization, which treats each trajectory as a single training sample. However, this approach can be inefficient and yield misleading learning signals: it applies uniform sampling across tasks regardless of difficulty, penalizes correct intermediate actions in failed trajectories, and incurs high sample-collection costs. To address these issues, we propose STEP (Success-rate-aware...

ID: 2511.13091v1 cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Computational Measurement of Political Positions: A Review of Text-Based Ideal Point Estimation Algorithms

2025-11-19

Авторы:

Patrick Parschan, Charlott Jakob

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This article presents the first systematic review of unsupervised and semi-supervised computational text-based ideal point estimation (CT-IPE) algorithms, methods designed to infer latent political positions from textual data. These algorithms are widely used in political science, communication, computational social science, and computer science to estimate ideological preferences from parliamentary speeches, party manifestos, and social media. Over the past two decades, their development has cl...

ID: 2511.13238v1 cs.LG, cs.AI, cs.CL, cs.CY

arXiv PDF

📄 Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment

2025-11-19

Авторы:

Jea Kwon, Luiz Felipe Vecchietti, Sungwon Park, Meeyoung Cha

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Humans display significant uncertainty when confronted with moral dilemmas, yet the extent of such uncertainty in machines and AI agents remains underexplored. Recent studies have confirmed the overly confident tendencies of machine-generated responses, particularly in large language models (LLMs). As these systems are increasingly embedded in ethical decision-making scenarios, it is important to understand their moral reasoning and the inherent uncertainties in building reliable AI systems. Thi...

ID: 2511.13290v1 cs.AI, cs.CL, cs.CY

arXiv PDF

📄 AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research

2025-11-19

Авторы:

Alexandru-Mihai Apostu, Andrei Preda, Alexandra Daniela Damir, Diana Bolocan, Radu Tudor Ionescu, Ioana Croitoru, Mihaela Gaman

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generating thorough natural language explanations for threat detections remains an open problem in cybersecurity research, despite significant advances in automated malware detection systems. In this work, we present AutoMalDesc, an automated static analysis summarization framework that, following initial training on a small set of expert-curated examples, operates independently at scale. This approach leverages an iterative self-paced learning pipeline to progressively enhance output quality th...

ID: 2511.13333v1 cs.CR, cs.AI, cs.CL, cs.LG

arXiv PDF

Показано 141 - 150 из 1292 записей