AI-Based Measurement of Innovation: Mapping Expert Insight into Large Language Model Applications

2508.02430v1 cs.CL 2025-08-09

Авторы:

Robin Nowak, Patrick Figge, Carolin Haeussler

Резюме на русском

Измерение инноваций часто опирается на контекст-зависимые показатели и экспертные оценки, что ограничивает эмпирические исследования в сфере инноваций. В статье предлагается фреймворк на основе ларже лангуэдж моделей (LLM), который автоматизирует измерение инновации с помощью аппроксимации оценок экспертов из неструктурированных текстовых данных. Это решение устраняет ограничения, связанные с трудоемкими и дорогостоящими экспертными оценками. В двух исследованиях проверяется производительность и надежность LLM-фреймворка для измерения инновации в двух сферах: (1) оценке инновационности обновлений программного обеспечения и (2) пользовательских отзывов и предложений по улучшениям продуктов. Фреймворк показал лучшие результаты по F1-мере и надежности в сравнении с альтернативными методами из предыдущих исследований и современными ML/DL-моделями. Основные выводы: LLM могут значительно улучшить измерение инноваций, обеспечивая доступность, надежность и эффективность для R&D-команд, исследователей и рецензентов. Авторы также охватывают влияние ключевых решений, таких как выбор модели, обучение с подкреплением, размер и распределение тренировочных данных.

Abstract

Measuring innovation often relies on context-specific proxies and on expert evaluation. Hence, empirical innovation research is often limited to settings where such data is available. We investigate how large language models (LLMs) can be leveraged to overcome the constraints of manual expert evaluations and assist researchers in measuring innovation. We design an LLM framework that reliably approximates domain experts' assessment of innovation from unstructured text data. We demonstrate the performance and broad applicability of this framework through two studies in different contexts: (1) the innovativeness of software application updates and (2) the originality of user-generated feedback and improvement ideas in product reviews. We compared the performance (F1-score) and reliability (consistency rate) of our LLM framework against alternative measures used in prior innovation studies, and to state-of-the-art machine learning- and deep learning-based models. The LLM framework achieved higher F1-scores than the other approaches, and its results are highly consistent (i.e., results do not change across runs). This article equips R&D personnel in firms, as well as researchers, reviewers, and editors, with the knowledge and tools to effectively use LLMs for measuring innovation and evaluating the performance of LLM-based innovation measures. In doing so, we discuss, the impact of important design decisions-including model selection, prompt engineering, training data size, training data distribution, and parameter settings-on performance and reliability. Given the challenges inherent in using human expert evaluation and existing text-based measures, our framework has important implications for harnessing LLMs as reliable, increasingly accessible, and broadly applicable research tools for measuring innovation.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

AI-Based Measurement of Innovation: Mapping Expert Insight into Large Language Model Applications

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

Nexus: Higher-Order Attention Mechanisms in Transformers

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation

SQuARE: Structured Query & Adaptive Retrieval Engine For Tabular Formats

RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Mod...

Навигация