📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks

2025-12-05

Авторы:

Songwen Zhao, Danqing Wang, Kexun Zhang, Jiaxuan Luo, Zhuo Li, Lei Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vibe coding is a new programming paradigm in which human engineers instruct large language model (LLM) agents to complete complex coding tasks with little supervision. Although it is increasingly adopted, are vibe coding outputs really safe to deploy in production? To answer this question, we propose SU S VI B E S, a benchmark consisting of 200 feature-request software engineering tasks from real-world open-source projects, which, when given to human programmers, led to vulnerable implementation...

ID: 2512.03262v1 cs.SE, cs.CL

arXiv PDF

📄 Bias Testing and Mitigation in Black Box LLMs using Metamorphic Relations

2025-12-04

Авторы:

Sina Salimian, Gias Uddin, Sumon Biswas, Henry Leung

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The widespread deployment of Large Language Models (LLMs) has intensified concerns about subtle social biases embedded in their outputs. Existing guardrails often fail when faced with indirect or contextually complex bias-inducing prompts. To address these limitations, we propose a unified framework for both systematic bias evaluation and targeted mitigation. Our approach introduces six novel Metamorphic Relations (MRs) that, based on metamorphic testing principles, transform direct bias-inducin...

ID: 2512.00556v1 cs.SE, cs.CL

arXiv PDF

📄 BackportBench: A Multilingual Benchmark for Automated Backporting of Patches

2025-12-03

Авторы:

Zhiqing Zhong, Jiaming Huang, Pinjia He

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Many modern software projects evolve rapidly to incorporate new features and security patches. It is important for users to update their dependencies to safer versions, but many still use older, vulnerable package versions because upgrading can be difficult and may break their existing codebase. Software developers can mitigate this problem by backporting security patches to older releases. However, manually backporting is time-consuming and error-prone. The effectiveness of existing automated b...

ID: 2512.01396v1 cs.SE, cs.CL, cs.CR

arXiv PDF

📄 From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

2025-11-26

Авторы:

Jian Yang, Wei Zhang, Shark Liu, Jiajun Wu, Shawn Guo, Yizhi Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving commercial adoption through tools like Github Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic). While the field has evolved dramatically from rule-based systems to Transformer-based architectures, achieving performance improvements from single-digit to over 95\% success rates ...

ID: 2511.18538v1 cs.SE, cs.CL

arXiv PDF

📄 M, Toolchain and Language for Reusable Model Compilation

2025-11-20

Авторы:

Hiep Hong Trinh, Federico Ciccozzi, Abu Naser Masud, Marjan Sirjani, Mikael Sjödin

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Complex software-driven systems often interleave distributed, concurrent computation processes with physical interactions with the environment. Developing these systems more efficiently and safely can be achieved by employing actionable, software-based models. From a high-level system model, engineers often need to derive multiple specialized models for different purposes, including simulation, deployment, and formal verification. Each of these target models usually rely on its own formalism, sp...

ID: 2511.15257v1 cs.SE, cs.CL

arXiv PDF

📄 Show and Tell: Prompt Strategies for Style Control in Multi-Turn LLM Code Generation

2025-11-19

Авторы:

Jeremiah Bohr

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Language models generate functionally correct code that tends toward excessive verbosity, with elaborate documentation and defensive patterns that diverge from human baselines. Two prompting mechanisms have emerged for stylistic control: instruction based prompts that articulate abstract directives, and example based prompts that provide concrete code demonstrations. The core problem is whether stylistic constraints persist when models enhance initial implementations with additional features whi...

ID: 2511.13972v1 cs.SE, cs.CL

arXiv PDF

📄 Routesplain: Towards Faithful and Intervenable Routing for Software-related Tasks

2025-11-15

Авторы:

Adam Štorek, Vikas Upadhyay, Marianne Menglin Liu, Daniel W. Peterson, Anshul Mittal, Sujeeth Bharadwaj, Fahad Shah, Dan Roth

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

LLMs now tackle a wide range of software-related tasks, yet we show that their performance varies markedly both across and within these tasks. Routing user queries to the appropriate LLMs can therefore help improve response quality while reducing cost. Prior work, however, has focused mainly on general-purpose LLM routing via black-box models. We introduce Routesplain, the first LLM router for software-related tasks, including multilingual code generation and repair, input/output prediction, and...

ID: 2511.09373v1 cs.SE, cs.CL, cs.LG

arXiv PDF

📄 GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

2025-11-06

Авторы:

Jie JW Wu, Ayanda Patrick Herlihy, Ahmad Saleem Mirza, Ali Afoud, Fatemeh Fard

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

With the software industry shifting toward a data-driven culture, online A/B testing is a key tool for evaluating new technologies. However, deploying such experiments requires substantial resources, may negatively impact users, and involves long data collection periods. To address this, \textit{off-policy evaluation (OPE)}, or offline A/B testing, uses logged data to assess technologies and is fundamental in Reinforcement Learning, making it crucial in domains where online testing is costly or ...

ID: 2511.00802v1 cs.SE, cs.CL, cs.LG

arXiv PDF

📄 HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning

2025-11-06

Авторы:

Yujian Liu, Jiabao Ji, Yang Zhang, Wenbo Guo, Tommi Jaakkola, Shiyu Chang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Existing LLM-based automatic test generation methods mainly produce input and expected output pairs to categorize the intended behavior of correct programs. Although straightforward, these methods have limited diversity in generated tests and cannot provide enough debugging information. We propose HarnessLLM, a two-stage training pipeline that enables LLMs to write harness code for testing. Particularly, LLMs generate code that synthesizes inputs and validates the observed outputs, allowing comp...

ID: 2511.01104v1 cs.SE, cs.CL

arXiv PDF

📄 Hidden in Plain Sight: Where Developers Confess Self-Admitted Technical Debt

2025-11-06

Авторы:

Murali Sridharan, Mikel Robredo, Leevi Rantala, Matteo Esposito, Valentina Lenarduzzi, Mika Mantyla

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Context. Detecting Self-Admitted Technical Debt (SATD) is crucial for proactive software maintenance. Previous research has primarily targeted detecting and prioritizing SATD, with little focus on the source code afflicted with SATD. Our goal in this work is to connect the SATD comments with source code constructs that surround them. Method. We leverage the extensive SATD dataset PENTACET, containing code comments from over 9000 Java Open Source Software (OSS) repositories. We quantitatively i...

ID: 2511.01529v1 cs.SE, cs.CL, cs.PL

arXiv PDF

Показано 1 - 10 из 33 записей