📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 MEC$^3$O: Multi-Expert Consensus for Code Time Complexity Prediction

2025-10-14

Авторы:

Joonghyuk Hahn, Soohan Lim, Yo-Sub Han

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Predicting the complexity of source code is essential for software development and algorithm analysis. Recently, Baik et al. (2025) introduced CodeComplex for code time complexity prediction. The paper shows that LLMs without fine-tuning struggle with certain complexity classes. This suggests that no single LLM excels at every class, but rather each model shows advantages in certain classes. We propose MEC$^3$O, a multi-expert consensus system, which extends the multi-agent debate frameworks. ME...

ID: 2510.09049v1 cs.AI, cs.SE, 68T50, I.2.7

arXiv PDF

📄 Traceability and Accountability in Role-Specialized Multi-Agent LLM Pipelines

2025-10-11

Авторы:

Amine Barrak

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Sequential multi-agent systems built with large language models (LLMs) can automate complex software tasks, but they are hard to trust because errors quietly pass from one stage to the next. We study a traceable and accountable pipeline, meaning a system with clear roles, structured handoffs, and saved records that let us trace who did what at each step and assign blame when things go wrong. Our setting is a Planner -> Executor -> Critic pipeline. We evaluate eight configurations of three state-...

ID: 2510.07614v1 cs.AI, cs.SE

arXiv PDF

📄 Platform-Agnostic Modular Architecture for Quantum Benchmarking

2025-10-11

Авторы:

Neer Patel, Anish Giri, Hrushikesh Pramod Patil, Noah Siekierski, Avimita Chatterjee, Sonika Johri, Timothy Proctor, Thomas Lubinski, Siyuan Niu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We present a platform-agnostic modular architecture that addresses the increasingly fragmented landscape of quantum computing benchmarking by decoupling problem generation, circuit execution, and results analysis into independent, interoperable components. Supporting over 20 benchmark variants ranging from simple algorithmic tests like Bernstein-Vazirani to complex Hamiltonian simulation with observable calculations, the system integrates with multiple circuit generation APIs (Qiskit, CUDA-Q, Ci...

ID: 2510.08469v1 quant-ph, cs.AI, cs.SE

arXiv PDF

📄 Vul-R2: A Reasoning LLM for Automated Vulnerability Repair

2025-10-09

Авторы:

Xin-Cheng Wen, Zirui Lin, Yijun Yang, Cuiyun Gao, Deheng Ye

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The exponential increase in software vulnerabilities has created an urgent need for automatic vulnerability repair (AVR) solutions. Recent research has formulated AVR as a sequence generation problem and has leveraged large language models (LLMs) to address this problem. Typically, these approaches prompt or fine-tune LLMs to generate repairs for vulnerabilities directly. Although these methods show state-of-the-art performance, they face the following challenges: (1) Lack of high-quality, vulne...

ID: 2510.05480v1 cs.AI, cs.SE

arXiv PDF

📄 MulVuln: Enhancing Pre-trained LMs with Shared and Language-Specific Knowledge for Multilingual Vulnerability Detection

2025-10-08

Авторы:

Van Nguyen, Surya Nepal, Xingliang Yuan, Tingmin Wu, Fengchao Chen, Carsten Rudolph

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Software vulnerabilities (SVs) pose a critical threat to safety-critical systems, driving the adoption of AI-based approaches such as machine learning and deep learning for software vulnerability detection. Despite promising results, most existing methods are limited to a single programming language. This is problematic given the multilingual nature of modern software, which is often complex and written in multiple languages. Current approaches often face challenges in capturing both shared and ...

ID: 2510.04397v1 cs.CR, cs.AI, cs.SE

arXiv PDF

📄 Dissecting Transformers: A CLEAR Perspective towards Green AI

2025-10-07

Авторы:

Hemang Jain, Shailender Goyal, Divyansh Pandey, Karthik Vaidhyanathan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The rapid adoption of Large Language Models (LLMs) has raised significant environmental concerns. Unlike the one-time cost of training, LLM inference occurs continuously at a global scale and now dominates the AI energy footprint. Yet, most sustainability studies report only coarse, model-level metrics due to the lack of fine-grained measurement methods, treating energy efficiency more as an afterthought than as a primary objective. We present the first fine-grained empirical analysis of inferen...

ID: 2510.02810v1 cs.LG, cs.AI, cs.SE

arXiv PDF

📄 From Facts to Foils: Designing and Evaluating Counterfactual Explanations for Smart Environments

2025-10-07

Авторы:

Anna Trapp, Mersedeh Sadeghi, Andreas Vogelsang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Explainability is increasingly seen as an essential feature of rule-based smart environments. While counterfactual explanations, which describe what could have been done differently to achieve a desired outcome, are a powerful tool in eXplainable AI (XAI), no established methods exist for generating them in these rule-based domains. In this paper, we present the first formalization and implementation of counterfactual explanations tailored to this domain. It is implemented as a plugin that exten...

ID: 2510.03078v1 cs.AI, cs.SE

arXiv PDF

📄 MAVUL: Multi-Agent Vulnerability Detection via Contextual Reasoning and Interactive Refinement

2025-10-05

Авторы:

Youpeng Li, Kartik Joshi, Xinda Wang, Eric Wong

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The widespread adoption of open-source software (OSS) necessitates the mitigation of vulnerability risks. Most vulnerability detection (VD) methods are limited by inadequate contextual understanding, restrictive single-round interactions, and coarse-grained evaluations, resulting in undesired model performance and biased evaluation results. To address these challenges, we propose MAVUL, a novel multi-agent VD system that integrates contextual reasoning and interactive refinement. Specifically, a...

ID: 2510.00317v1 cs.CR, cs.AI, cs.SE

arXiv PDF

📄 90% Faster, 100% Code-Free: MLLM-Driven Zero-Code 3D Game Development

2025-10-02

Авторы:

Runxin Yang, Yuxuan Wan, Shuqing Li, Michael R. Lyu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Developing 3D games requires specialized expertise across multiple domains, including programming, 3D modeling, and engine configuration, which limits access to millions of potential creators. Recently, researchers have begun to explore automated game development. However, existing approaches face three primary challenges: (1) limited scope to 2D content generation or isolated code snippets; (2) requirement for manual integration of generated components into game engines; and (3) poor performanc...

ID: 2509.26161v1 cs.AI, cs.SE

arXiv PDF

📄 Diagnosing Failure Root Causes in Platform-Orchestrated Agentic Systems: Dataset, Taxonomy, and Benchmark

2025-10-01

Авторы:

Xuyan Ma, Xiaofei Xie, Yawen Wang, Junjie Wang, Boyu Wu, Mingyang Li, Qing Wang

## Контекст Агентские системы, объединяющие несколько систем на основе Large Language Models (LLM), взаимодействующих через инструменты и структурированные интеракции, широко применяются для решения сложных задач. Улучшение таких систем требует понимания их проблем, в том числе идентификации проблемных мест. Особенно актуальным становится это в условиях появления низкокодовых платформ, например, Dify, которые позволяют быстро создавать и управлять агентскими системами. Однако недостаточность методов для выявления корневых причин недостатков в таких системах остается значимой проблемой. ## Метод Для исследования корневых причин недостатков в агентских системах был создан набор данных AgentFail, содержащий 307 записей об ошибках из 10 различных агентских систем. Для каждой записи были применены тонко постановленные аннотации, связывающие ошибки с их причинами. Для повышения надежности этого процесса использовалось отладочное решение на основе рефлективного розыска причин. Далее был разработан таксономический анализ для классификации причин ошибок и создана б BENCHMARK для автоматизированного определения корневых причин. ## Результаты Исследования показали, что существует 10 основных причин недостатков в агентских системах. Многие из них связаны с некорректным обращением с метаданными, неполным вычислением задач и ошибками в синтаксисе. Бенчмарк показал, что использование таксономии увеличивает точность идентификации корневых причин до 33.6%, что свидетельствует о сложности этой задачи. Несмотря на это, результаты указали на то, что задача идентификации корневых причин может быть улучшена с помощью таксономии и внедрения новых аналитических подходов. ## Значимость Предложенный набор данных и таксономия могут быть применены в разработке новых методов тестирования и мониторинга агентских систем. Эти инструменты позволят оптимизировать решения, снизить стоимость и увеличить надежность таких систем. В будущем, улучшение таксономии и интеграция более сложных методов могут повысить точность идентификации причин недостатков. ## Выводы Предложенная работа предоставляет надежный набор данных и аналитические средства для выявления и анализа корневых причин недостатков в агентских системах. Результаты показывают, что текущая точность идентификации корневых причин довольно низкая, что подтверждает сложность задачи. Несмотря на это, разработанные методы могут стать фундаментом для дальнейших исследований в этой области.

Annotation:

Agentic systems consisting of multiple LLM-driven agents coordinating through tools and structured interactions, are increasingly deployed for complex reasoning and problem-solving tasks. At the same time, emerging low-code and template-based agent development platforms (e.g., Dify) enable users to rapidly build and orchestrate agentic systems, which we refer to as platform-orchestrated agentic systems. However, these systems are also fragile and it remains unclear how to systematically identify...

ID: 2509.23735v1 cs.AI, cs.SE

arXiv PDF

Показано 31 - 40 из 72 записей