📊 Статистика дайджестов

Всего дайджестов: 34123 Добавлено сегодня: 101

Последнее обновление: сегодня

📄 David vs. Goliath: A comparative study of different-sized LLMs for code generation in the domain of automotive scenario generation

2025-10-19

Авторы:

Philipp Bauerfeind, Amir Salarpour, David Fernandez, Pedram MohajerAnsari, Johannes Reschke, Mert D. Pesé

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Scenario simulation is central to testing autonomous driving systems. Scenic, a domain-specific language (DSL) for CARLA, enables precise and reproducible scenarios, but NL-to-Scenic generation with large language models (LLMs) suffers from scarce data, limited reproducibility, and inconsistent metrics. We introduce NL2Scenic, an open dataset and framework with 146 NL/Scenic pairs, a difficulty-stratified 30-case test split, an Example Retriever, and 14 prompting variants (ZS, FS, CoT, SP, MoT)....

ID: 2510.14115v1 cs.SE, cs.LG

arXiv PDF

📄 Leveraging Code Cohesion Analysis to Identify Source Code Supply Chain Attacks

2025-10-18

Авторы:

Maor Reuben, Ido Mendel, Or Feldman, Moshe Kravchik, Mordehai Guri, Rami Puzis

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Supply chain attacks significantly threaten software security with malicious code injections within legitimate projects. Such attacks are very rare but may have a devastating impact. Detecting spurious code injections using automated tools is further complicated as it often requires deciphering the intention of both the inserted code and its context. In this study, we propose an unsupervised approach for highlighting spurious code injections by quantifying cohesion disruptions in the source code...

ID: 2510.14778v1 cs.SE, cs.LG

arXiv PDF

📄 Instruction Set Migration at Warehouse Scale

2025-10-18

Авторы:

Eric Christopher, Kevin Crossan, Wolff Dobson, Chris Kennelly, Drew Lewis, Kun Lin, Martin Maas, Parthasarathy Ranganathan, Emma Rapati, Brian Yang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Migrating codebases from one instruction set architecture (ISA) to another is a major engineering challenge. A recent example is the adoption of Arm (in addition to x86) across the major Cloud hyperscalers. Yet, this problem has seen limited attention by the academic community. Most work has focused on static and dynamic binary translation, and the traditional conventional wisdom has been that this is the primary challenge. In this paper, we show that this is no longer the case. Modern ISA mig...

ID: 2510.14928v1 cs.SE, cs.LG

arXiv PDF

📄 On Pretraining for Project-Level Code Completion

2025-10-17

Авторы:

Maksim Sapronov, Evgeniy Glukhov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Repository-level pretraining is commonly used to enable large language models for code to leverage codebase-wide context. This enhances their ability to generate accurate and context-aware code completions. In this work, we investigate how different repository-processing strategies affect in-context learning in OpenCoder, a 1.5B-parameter model. We extend its context window from 4,096 to 16,384 tokens by training on additional 1B tokens of curated repository-level data. Despite relying on a smal...

ID: 2510.13697v1 cs.SE, cs.LG

arXiv PDF

📄 Grounded AI for Code Review: Resource-Efficient Large-Model Serving in Enterprise Pipelines

2025-10-16

Авторы:

Sayan Mandal, Hua Jiang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Automated code review adoption lags in compliance-heavy settings, where static analyzers produce high-volume, low-rationale outputs, and naive LLM use risks hallucination and incurring cost overhead. We present a production system for grounded, PR-native review that pairs static-analysis findings with AST-guided context extraction and a single-GPU, on-demand serving stack (quantized open-weight model, multi-tier caching) to deliver concise explanations and remediation guidance. Evaluated on safe...

ID: 2510.10290v1 cs.SE, cs.LG

arXiv PDF

📄 Diff-XYZ: A Benchmark for Evaluating Diff Understanding

2025-10-16

Авторы:

Evgeniy Glukhov, Michele Conti, Egor Bogomolov, Yaroslav Golubev, Alexander Bezzubov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reliable handling of code diffs is central to agents that edit and refactor repositories at scale. We introduce Diff-XYZ, a compact benchmark for code-diff understanding with three supervised tasks: apply (old code $+$ diff $\rightarrow$ new code), anti-apply (new code $-$ diff $\rightarrow$ old code), and diff generation (new code $-$ old code $\rightarrow$ diff). Instances in the benchmark are triples $\langle \textit{old code}, \textit{new code}, \textit{diff} \rangle$ drawn from real commits...

ID: 2510.12487v1 cs.SE, cs.LG

arXiv PDF

📄 Mechanistic Interpretability of Code Correctness in LLMs via Sparse Autoencoders

2025-10-07

Авторы:

Kriz Tahimic, Charibeth Cheng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As Large Language Models become integral to software development, with substantial portions of AI-suggested code entering production, understanding their internal correctness mechanisms becomes critical for safe deployment. We apply sparse autoencoders to decompose LLM representations, identifying directions that correspond to code correctness. We select predictor directions using t-statistics and steering directions through separation scores from base model representations, then analyze their m...

ID: 2510.02917v1 cs.SE, cs.LG

arXiv PDF

📄 Towards Verified Code Reasoning by LLMs

2025-10-02

Авторы:

Meghana Sistla, Gogul Balakrishnan, Pat Rondon, José Cambronero, Michele Tufano, Satish Chandra

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. This prevents the agent from being useful in situations where high precision is desired: (1) helping a software engineer understand a new code base, (2) helping a software engineer during code review sessions, and (3) ensuring that the code generated by an automated code generation system meets certain requirements (e.g. fixes a bug, improves readability, implements a feature...

ID: 2509.26546v1 cs.SE, cs.LG

arXiv PDF

📄 Influence-Guided Concolic Testing of Transformer Robustness

2025-10-01

Авторы:

Chih-Duo Hong, Yu Wang, Yao-Chen Chang, Fang Yu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Concolic testing for deep neural networks alternates concrete execution with constraint solving to search for inputs that flip decisions. We present an {influence-guided} concolic tester for Transformer classifiers that ranks path predicates by SHAP-based estimates of their impact on the model output. To enable SMT solving on modern architectures, we prototype a solver-compatible, pure-Python semantics for multi-head self-attention and introduce practical scheduling heuristics that temper constr...

ID: 2509.23806v1 cs.SE, cs.LG

arXiv PDF

📄 Learning-Based Testing for Deep Learning: Enhancing Model Robustness with Adversarial Input Prioritization

2025-10-01

Авторы:

Sheikh Md Mushfiqur Rahman, Nasir Eisty

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Context: Deep Neural Networks (DNNs) are increasingly deployed in critical applications, where resilience against adversarial inputs is paramount. However, whether coverage-based or confidence-based, existing test prioritization methods often fail to efficiently identify the most fault-revealing inputs, limiting their practical effectiveness. Aims: This project aims to enhance fault detection and model robustness in DNNs by integrating Learning-Based Testing (LBT) with hypothesis and mutation te...

ID: 2509.23961v1 cs.SE, cs.LG

arXiv PDF

Показано 21 - 30 из 55 записей