📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis

2025-11-01

Авторы:

Dong Huang, Mingzhe Du, Jie M. Zhang, Zheng Lin, Meng Luo, Qianru Zhang, See-Kiong Ng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Test oracle generation in non-regression testing is a longstanding challenge in software engineering, where the goal is to produce oracles that can accurately determine whether a function under test (FUT) behaves as intended for a given input. In this paper, we introduce Nexus, a novel multi-agent framework to address this challenge. Nexus generates test oracles by leveraging a diverse set of specialized agents that synthesize test oracles through a structured process of deliberation, validation...

ID: 2510.26423v1 cs.SE, cs.CL

arXiv PDF

📄 Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation

2025-10-30

Авторы:

Lingyue Fu, Bolun Zhang, Hao Guan, Yaoming Zhu, Lin Qiu, Weiwen Liu, Xuezhi Cao, Xunliang Cai, Weinan Zhang, Yong Yu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent advances in code agents have enabled automated software development at the project level, supported by large language models (LLMs) and widely adopted tools. However, existing benchmarks for code agent evaluation face two major limitations: high annotation cost and expertise requirements, and rigid evaluation metrics that rely primarily on unit tests. To address these challenges, we propose an agent-driven benchmark construction pipeline that leverages human supervision to efficiently gen...

ID: 2510.24358v1 cs.SE, cs.CL

arXiv PDF

📄 Wisdom and Delusion of LLM Ensembles for Code Generation and Repair

2025-10-28

Авторы:

Fernando Vallecillos Ruiz, Max Hort, Leon Moonen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Today's pursuit of a single Large Language Model (LMM) for all software engineering tasks is resource-intensive and overlooks the potential benefits of complementarity, where different models contribute unique strengths. However, the degree to which coding LLMs complement each other and the best strategy for maximizing an ensemble's potential are unclear, leaving practitioners without a clear path to move beyond single-model systems. To address this gap, we empirically compare ten individual L...

ID: 2510.21513v1 cs.SE, cs.CL, cs.LG

arXiv PDF

📄 SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations

2025-10-25

Авторы:

Amila Indika, Igor Molybog

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Numerous knowledge workers utilize spreadsheets in business, accounting, and finance. However, a lack of systematic documentation methods for spreadsheets hinders automation, collaboration, and knowledge transfer, which risks the loss of crucial institutional knowledge. This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations. Many previous studies have utilized Large Language Models (LLMs) for g...

ID: 2510.19864v1 cs.SE, cs.CL, cs.LG

arXiv PDF

📄 SIADAFIX: issue description response for adaptive program repair

2025-10-22

Авторы:

Xin Cao, Nan Yu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We propose utilizing fast and slow thinking to enhance the capabilities of large language model-based agents on complex tasks such as program repair. In particular, we design an adaptive program repair method based on issue description response, called SIADAFIX. The proposed method utilizes slow thinking bug fix agent to complete complex program repair tasks, and employs fast thinking workflow decision components to optimize and classify issue descriptions, using issue description response resul...

ID: 2510.16059v1 cs.SE, cs.CL, D.2.2; D.2.3

arXiv PDF

📄 Leveraging Test Driven Development with Large Language Models for Reliable and Verifiable Spreadsheet Code Generation: A Research Framework

2025-10-21

Авторы:

Dr Simon Thorne, Dr Advait Sarkar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models (LLMs), such as ChatGPT, are increasingly leveraged for generating both traditional software code and spreadsheet logic. Despite their impressive generative capabilities, these models frequently exhibit critical issues such as hallucinations, subtle logical inconsistencies, and syntactic errors, risks particularly acute in high stakes domains like financial modelling and scientific computations, where accuracy and reliability are paramount. This position paper proposes a st...

ID: 2510.15585v1 cs.SE, cs.CL, cs.PL, F.2.2; I.2.7

arXiv PDF

📄 Task-Aware Reduction for Scalable LLM-Database Systems

2025-10-16

Авторы:

Marcus Emmanuel Barnes, Taher A. Ghaleb, Safwat Hassan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models (LLMs) are increasingly applied to data-intensive workflows, from database querying to developer observability. Yet the effectiveness of these systems is constrained by the volume, verbosity, and noise of real-world text-rich data such as logs, telemetry, and monitoring streams. Feeding such data directly into LLMs is costly, environmentally unsustainable, and often misaligned with task objectives. Parallel efforts in LLM efficiency have focused on model- or architecture-le...

ID: 2510.11813v1 cs.SE, cs.CL, cs.DB

arXiv PDF

📄 A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System

2025-10-15

Авторы:

Jiale Guo, Suizhi Huang, Mei Li, Dong Huang, Xingsheng Chen, Regina Zhang, Zhijiang Guo, Han Yu, Siu-Ming Yiu, Christian Jensen, Pietro Lio, Kwok-Yan Lam

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The integration of LLMs into software engineering has catalyzed a paradigm shift from traditional rule-based systems to sophisticated agentic systems capable of autonomous problem-solving. Despite this transformation, the field lacks a comprehensive understanding of how benchmarks and solutions interconnect, hindering systematic progress and evaluation. This survey presents the first holistic analysis of LLM-empowered software engineering, bridging the critical gap between evaluation and solutio...

ID: 2510.09721v1 cs.SE, cs.CL

arXiv PDF

📄 Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

2025-10-08

Авторы:

Yicheng Tao, Yao Qin, Yepang Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent advancements in large language models (LLMs) have substantially improved automated code generation. While function-level and file-level generation have achieved promising results, real-world software development typically requires reasoning across entire repositories. This gives rise to the challenging task of Repository-Level Code Generation (RLCG), where models must capture long-range dependencies, ensure global semantic consistency, and generate coherent code spanning multiple files or...

ID: 2510.04905v1 cs.SE, cs.CL

arXiv PDF

📄 When Names Disappear: Revealing What LLMs Actually Understand About Code

2025-10-07

Авторы:

Cuong Chi Le, Minh V. T. Pham, Cuong Duc Van, Hoang N. Phan, Huy N. Phan, Tien N. Nguyen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models (LLMs) achieve strong results on code tasks, but how they derive program meaning remains unclear. We argue that code communicates through two channels: structural semantics, which define formal behavior, and human-interpretable naming, which conveys intent. Removing the naming channel severely degrades intent-level tasks such as summarization, where models regress to line-by-line descriptions. Surprisingly, we also observe consistent reductions on execution tasks that shoul...

ID: 2510.03178v1 cs.SE, cs.CL

arXiv PDF

Показано 11 - 20 из 33 записей