📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Automating Complex Document Workflows via Stepwise and Rollback-Enabled Operation Orchestration

2025-12-06

Авторы:

Yanbin Zhang, Hanhui Ye, Yue Bai, Qiming Zhang, Liao Xiang, Wu Mianzhi, Renjun Hu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Workflow automation promises substantial productivity gains in everyday document-related tasks. While prior agentic systems can execute isolated instructions, they struggle with automating multi-step, session-level workflows due to limited control over the operational process. To this end, we introduce AutoDW, a novel execution framework that enables stepwise, rollback-enabled operation orchestration. AutoDW incrementally plans API actions conditioned on user instructions, intent-filtered API ca...

ID: 2512.04445v1 cs.SE, cs.AI

arXiv PDF

📄 Generative AI for Self-Adaptive Systems: State of the Art and Research Roadmap

2025-12-06

Авторы:

Jialong Li, Mingyue Zhang, Nianyu Li, Danny Weyns, Zhi Jin, Kenji Tei

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Self-adaptive systems (SASs) are designed to handle changes and uncertainties through a feedback loop with four core functionalities: monitoring, analyzing, planning, and execution. Recently, generative artificial intelligence (GenAI), especially the area of large language models, has shown impressive performance in data comprehension and logical reasoning. These capabilities are highly aligned with the functionalities required in SASs, suggesting a strong potential to employ GenAI to enhance SA...

ID: 2512.04680v1 cs.SE, cs.AI, cs.HC

arXiv PDF

📄 Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage

2025-12-05

Авторы:

Nolan Platt, Ethan Luchs, Sehrish Nizamani

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Usability evaluations are essential for ensuring that modern interfaces meet user needs, yet traditional heuristic evaluations by human experts can be time-consuming and subjective, especially early in development. This paper investigates whether large language models (LLMs) can provide reliable and consistent heuristic assessments at the development stage. By applying Jakob Nielsen's ten usability heuristics to thirty open-source websites, we generated over 850 heuristic evaluations in three in...

ID: 2512.04262v1 cs.SE, cs.AI, cs.HC

arXiv PDF

📄 Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

2025-12-05

Авторы:

Tyler Slater

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As Large Language Models (LLMs) transition from code completion tools to autonomous system architects, their impact on long-term software maintainability remains unquantified. While existing research benchmarks functional correctness (pass@k), this study presents the first empirical framework to measure "Architectural Erosion" and the accumulation of Technical Debt in AI-synthesized microservices. We conducted a comparative pilot study of three state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet...

ID: 2512.04273v1 cs.SE, cs.AI

arXiv PDF

📄 MANTRA: a Framework for Multi-stage Adaptive Noise TReAtment During Training

2025-12-05

Авторы:

Zixiao Zhao, Fatemeh H. Fard, Jie JW Wu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The reliable application of deep learning models to software engineering tasks hinges on high-quality training data. Yet, large-scale repositories inevitably introduce noisy or mislabeled examples that degrade both accuracy and robustness. While Noise Label Learning (NLL) has been extensively studied in other fields, there are a few works that investigate NLL in Software Engineering (SE) and Large Language Models (LLMs) for SE tasks. In this work, we propose MANTRA, a Multi-stage Adaptive Noise ...

ID: 2512.04319v1 cs.SE, cs.AI

arXiv PDF

📄 Beyond Greenfield: The D3 Framework for AI-Driven Productivity in Brownfield Engineering

2025-12-04

Авторы:

Krishna Kumaar Sharma

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Brownfield engineering work involving legacy systems, incomplete documentation, and fragmented architectural knowledge poses unique challenges for the effective use of large language models (LLMs). Prior research has largely focused on greenfield or synthetic tasks, leaving a gap in structured workflows for complex, context-heavy environments. This paper introduces the Discover-Define-Deliver (D3) Framework, a disciplined LLM-assisted workflow that combines role-separated prompting strategies wi...

ID: 2512.01155v2 cs.SE, cs.AI

arXiv PDF

📄 LLM-as-a-Judge for Scalable Test Coverage Evaluation: Accuracy, Operational Reliability, and Cost

2025-12-04

Авторы:

Donghao Huang, Shila Chew, Anna Dutkiewicz, Zhaoxia Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Assessing software test coverage at scale remains a bottleneck in QA pipelines. We present LLM-as-a-Judge (LAJ), a production-ready, rubric-driven framework for evaluating Gherkin acceptance tests with structured JSON outputs. Across 20 model configurations (GPT-4, GPT-5 with varying reasoning effort, and open-weight models) on 100 expert-annotated scripts over 5 runs (500 evaluations), we provide the first comprehensive analysis spanning accuracy, operational reliability, and cost. We introduce...

ID: 2512.01232v1 cs.SE, cs.AI

arXiv PDF

📄 Teaching an Online Multi-Institutional Research Level Software Engineering Course with Industry - an Experience Report

2025-12-04

Авторы:

Pankaj Jalote, Y. Raghu Reddy, Vasudeva Varma

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Covid has made online teaching and learning acceptable and students, faculty, and industry professionals are all comfortable with this mode. This comfort can be leveraged to offer an online multi-institutional research-level course in an area where individual institutions may not have the requisite faculty to teach and/or research students to enroll. If the subject is of interest to industry, online offering also allows industry experts to contribute and participate with ease. Advanced topics in...

ID: 2512.01523v1 cs.SE, cs.AI, cs.LG

arXiv PDF

📄 An Empirical Study of Agent Developer Practices in AI Agent Frameworks

2025-12-04

Авторы:

Yanlin Wang, Xinyi Xu, Jiachi Chen, Tingting Bi, Wenchao Gu, Zibin Zheng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The rise of large language models (LLMs) has sparked a surge of interest in agents, leading to the rapid growth of agent frameworks. Agent frameworks are software toolkits and libraries that provide standardized components, abstractions, and orchestration mechanisms to simplify agent development. Despite widespread use of agent frameworks, their practical applications and how they influence the agent development process remain underexplored. Different agent frameworks encounter similar problems ...

ID: 2512.01939v1 cs.SE, cs.AI

arXiv PDF

📄 Bin2Vec: Interpretable and Auditable Multi-View Binary Analysis for Code Plagiarism Detection

2025-12-04

Авторы:

Moussa Moussaoui, Tarik Houichime, Abdelalim Sadiq

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce Bin2Vec, a new framework that helps compare software programs in a clear and explainable way. Instead of focusing only on one type of information, Bin2Vec combines what a program looks like (its built-in functions, imports, and exports) with how it behaves when it runs (its instructions and memory usage). This gives a more complete picture when deciding whether two programs are similar or not. Bin2Vec represents these different types of information as views that can be inspected sep...

ID: 2512.02197v1 cs.SE, cs.AI

arXiv PDF

Показано 1 - 10 из 341 записей