📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage

2025-12-05

Авторы:

Nolan Platt, Ethan Luchs, Sehrish Nizamani

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Usability evaluations are essential for ensuring that modern interfaces meet user needs, yet traditional heuristic evaluations by human experts can be time-consuming and subjective, especially early in development. This paper investigates whether large language models (LLMs) can provide reliable and consistent heuristic assessments at the development stage. By applying Jakob Nielsen's ten usability heuristics to thirty open-source websites, we generated over 850 heuristic evaluations in three in...

ID: 2512.04262v1 cs.SE, cs.AI, cs.HC

arXiv PDF

📄 Hey GPT-OSS, Looks Like You Got It - Now Walk Me Through It! An Assessment of the Reasoning Language Models Chain of Thought Mechanism for Digital Forensics

2025-12-05

Авторы:

Gaëtan Michelet, Janine Schneider, Aruna Withanage, Frank Breitinger

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The use of large language models in digital forensics has been widely explored. Beyond identifying potential applications, research has also focused on optimizing model performance for forensic tasks through fine-tuning. However, limited result explainability reduces their operational and legal usability. Recently, a new class of reasoning language models has emerged, designed to handle logic-based tasks through an `internal reasoning' mechanism. Yet, users typically see only the final answer, n...

ID: 2512.04254v1 cs.CR, cs.AI

arXiv PDF

📄 Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

2025-12-05

Авторы:

Tyler Slater

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As Large Language Models (LLMs) transition from code completion tools to autonomous system architects, their impact on long-term software maintainability remains unquantified. While existing research benchmarks functional correctness (pass@k), this study presents the first empirical framework to measure "Architectural Erosion" and the accumulation of Technical Debt in AI-synthesized microservices. We conducted a comparative pilot study of three state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet...

ID: 2512.04273v1 cs.SE, cs.AI

arXiv PDF

📄 The Initialization Determines Whether In-Context Learning Is Gradient Descent

2025-12-05

Авторы:

Shifeng Xie, Rui Yuan, Simone Rossi, Thomas Hannagan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In-context learning (ICL) in large language models (LLMs) is a striking phenomenon, yet its underlying mechanisms remain only partially understood. Previous work connects linear self-attention (LSA) to gradient descent (GD), this connection has primarily been established under simplified conditions with zero-mean Gaussian priors and zero initialization for GD. However, subsequent studies have challenged this simplified view by highlighting its overly restrictive assumptions, demonstrating instea...

ID: 2512.04268v1 cs.LG, cs.AI

arXiv PDF

📄 The Geometry of Benchmarks: A New Path Toward AGI

2025-12-05

Авторы:

Przemyslaw Chojecki

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoning about generality or autonomous self-improvement. Here we introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space, and agent performance is described by capability functionals over this space. First, we define an Autonomous AI (AAI...

ID: 2512.04276v1 cs.AI, cs.LG, math.ST

arXiv PDF

📄 Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order

2025-12-05

Авторы:

Prakhar Gupta, Vaibhav Gupta

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Post-training with reinforcement learning (RL) typically optimizes a single scalar objective and ignores structure in how solutions are produced. We ask whether a scalar hint toward a canonical solver ordering, used only during RL post-training, improves performance even when fine-tuned on randomized solution sequences. On Sudoku, we train a Transformer with standard fine-tuning on randomized solving orders, then post-train it with Group Relative Policy Optimization (GRPO) with two rewards: cell...

ID: 2512.04277v1 cs.LG, cs.AI

arXiv PDF

📄 Learning Single-Image Super-Resolution in the JPEG Compressed Domain

2025-12-05

Авторы:

Sruthi Srinivasan, Elham Shakibapour, Rajy Rawther, Mehdi Saeedi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Deep learning models have grown increasingly complex, with input data sizes scaling accordingly. Despite substantial advances in specialized deep learning hardware, data loading continues to be a major bottleneck that limits training and inference speed. To address this challenge, we propose training models directly on encoded JPEG features, reducing the computational overhead associated with full JPEG decoding and significantly improving data loading efficiency. While prior works have focused o...

ID: 2512.04284v1 cs.CV, cs.AI

arXiv PDF

📄 Towards better dense rewards in Reinforcement Learning Applications

2025-12-05

Авторы:

Shuyuan Zhang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Finding meaningful and accurate dense rewards is a fundamental task in the field of reinforcement learning (RL) that enables agents to explore environments more efficiently. In traditional RL settings, agents learn optimal policies through interactions with an environment guided by reward signals. However, when these signals are sparse, delayed, or poorly aligned with the intended task objectives, agents often struggle to learn effectively. Dense reward functions, which provide informative feedb...

ID: 2512.04302v1 cs.AI

arXiv PDF

📄 Artificial Intelligence Applications in Horizon Scanning for Infectious Diseases

2025-12-05

Авторы:

Ian Miles, Mayumi Wakimoto, Wagner Meira, Daniela Paula, Daylene Ticiane, Bruno Rosa, Jane Biddulph, Stelios Georgiou, Valdir Ermida

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This review explores the integration of Artificial Intelligence into Horizon Scanning, focusing on identifying and responding to emerging threats and opportunities linked to Infectious Diseases. We examine how AI tools can enhance signal detection, data monitoring, scenario analysis, and decision support. We also address the risks associated with AI adoption and propose strategies for effective implementation and governance. The findings contribute to the growing body of Foresight literature by ...

ID: 2512.04287v1 cs.AI, q-bio.PE

arXiv PDF

📄 Evaluating Long-Context Reasoning in LLM-Based WebAgents

2025-12-05

Авторы:

Andy Chung, Yichi Zhang, Kaixiang Lin, Aditya Rawal, Qiaozi Gao, Joyce Chai

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for providing personalized and contextually aware assistance. However, the performance of these agents in long context scenarios, particularly for action-taking WebAgents operating in realistic web environments, remains largely unexplored. This paper introduces a benchmark for evaluating long context reasoning capabil...

ID: 2512.04307v1 cs.LG, cs.AI

arXiv PDF

1
2
12
13
14
15
16
1442
1443

Показано 131 - 140 из 14425 записей