Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

2512.04273v1 cs.SE, cs.AI 2025-12-05

Авторы:

Tyler Slater

Abstract

As Large Language Models (LLMs) transition from code completion tools to autonomous system architects, their impact on long-term software maintainability remains unquantified. While existing research benchmarks functional correctness (pass@k), this study presents the first empirical framework to measure "Architectural Erosion" and the accumulation of Technical Debt in AI-synthesized microservices. We conducted a comparative pilot study of three state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet, and Llama 3 8B) by prompting them to implement a standardized Book Lending Microservice under strict Hexagonal Architecture constraints. Utilizing Abstract Syntax Tree (AST) parsing, we find that while proprietary models achieve high architectural conformance (0% violation rate for GPT-5.1), open-weights models exhibit critical divergence. Specifically, Llama 3 demonstrated an 80% Architectural Violation Rate, frequently bypassing interface adapters to create illegal circular dependencies between Domain and Infrastructure layers. Furthermore, we identified a phenomenon of "Implementation Laziness," where open-weights models generated 60% fewer Logical Lines of Code (LLOC) than their proprietary counterparts, effectively omitting complex business logic to satisfy token constraints. These findings suggest that without automated architectural linting, utilizing smaller open-weights models for system scaffolding accelerates the accumulation of structural technical debt.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Automating Complex Document Workflows via Stepwise and Rollback-Enabled Operatio...

MANTRA: a Framework for Multi-stage Adaptive Noise TReAtment During Training

Beyond Greenfield: The D3 Framework for AI-Driven Productivity in Brownfield Eng...

LLM-as-a-Judge for Scalable Test Coverage Evaluation: Accuracy, Operational Reli...

An Empirical Study of Agent Developer Practices in AI Agent Frameworks

Навигация