📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Who is a Better Matchmaker? Human vs. Algorithmic Judge Assignment in a High-Stakes Startup Competition

2025-10-16

Авторы:

Sarina Xi, Orelia Pi, Miaomiao Zhang, Becca Xiong, Jacqueline Ng Lane, Nihar B. Shah

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

There is growing interest in applying artificial intelligence (AI) to automate and support complex decision-making tasks. However, it remains unclear how algorithms compare to human judgment in contexts requiring semantic understanding and domain expertise. We examine this in the context of the judge assignment problem, matching submissions to suitably qualified judges. Specifically, we tackled this problem at the Harvard President's Innovation Challenge, the university's premier venture competi...

ID: 2510.12692v1 cs.HC, cs.AI, cs.CL, cs.CY, cs.LG

arXiv PDF

📄 When or What? Understanding Consumer Engagement on Digital Platforms

2025-10-15

Авторы:

Jingyi Wu, Junying Liang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Understanding what drives popularity is critical in today's digital service economy, where content creators compete for consumer attention. Prior studies have primarily emphasized the role of content features, yet creators often misjudge what audiences actually value. This study applies Latent Dirichlet Allocation (LDA) modeling to a large corpus of TED Talks, treating the platform as a case of digital service provision in which creators (speakers) and consumers (audiences) interact. By comparin...

ID: 2510.10474v1 cs.CL, cs.CY

arXiv PDF

📄 Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models

2025-10-15

Авторы:

Georg Ahnert, Anna-Carolina Haensch, Barbara Plank, Markus Strohmaier

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Many in-silico simulations of human survey responses with large language models (LLMs) focus on generating closed-ended survey responses, whereas LLMs are typically trained to generate open-ended text instead. Previous research has used a diverse range of methods for generating closed-ended survey responses with LLMs, and a standard practice remains to be identified. In this paper, we systematically investigate the impact that various Survey Response Generation Methods have on predicted survey r...

ID: 2510.11586v1 cs.CL, cs.CY

arXiv PDF

📄 McMining: Automated Discovery of Misconceptions in Student Code

2025-10-14

Авторы:

Erfan Al-Hossami, Razvan Bunescu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

When learning to code, students often develop misconceptions about various programming language concepts. These can not only lead to bugs or inefficient code, but also slow down the learning of related concepts. In this paper, we introduce McMining, the task of mining programming misconceptions from samples of code from a student. To enable the training and evaluation of McMining systems, we develop an extensible benchmark dataset of misconceptions together with a large set of code samples where...

ID: 2510.08827v1 cs.SE, cs.AI, cs.CL, cs.CY

arXiv PDF

📄 Textual Entailment and Token Probability as Bias Evaluation Metrics

2025-10-11

Авторы:

Virginia K. Felkner, Allison Lim, Jonathan May

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Measurement of social bias in language models is typically by token probability (TP) metrics, which are broadly applicable but have been criticized for their distance from real-world langugage model use cases and harms. In this work, we test natural language inference (NLI) as a more realistic alternative bias metric. We show that, curiously, NLI and TP bias evaluation behave substantially differently, with very low correlation among different NLI metrics and between NLI and TP metrics. We find ...

ID: 2510.07662v1 cs.CL, cs.CY, I.2.7; K.4.2

arXiv PDF

📄 Evaluating LLM-Generated Legal Explanations for Regulatory Compliance in Social Media Influencer Marketing

2025-10-11

Авторы:

Haoyang Gui, Thales Bertaglia, Taylor Annabell, Catalina Goanta, Tjomme Dooper, Gerasimos Spanakis

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The rise of influencer marketing has blurred boundaries between organic content and sponsored content, making the enforcement of legal rules relating to transparency challenging. Effective regulation requires applying legal knowledge with a clear purpose and reason, yet current detection methods of undisclosed sponsored content generally lack legal grounding or operate as opaque "black boxes". Using 1,143 Instagram posts, we compare gpt-5-nano and gemini-2.5-flash-lite under three prompting stra...

ID: 2510.08111v1 cs.CL, cs.CY

arXiv PDF

📄 Evaluation of LLMs for Process Model Analysis and Optimization

2025-10-11

Авторы:

Akhil Kumar, Jianliang Leon Zhao, Om Dobariya

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this paper, we report our experience with several LLMs for their ability to understand a process model in an interactive, conversational style, find syntactical and logical errors in it, and reason with it in depth through a natural language (NL) interface. Our findings show that a vanilla, untrained LLM like ChatGPT (model o3) in a zero-shot setting is effective in understanding BPMN process models from images and answering queries about them intelligently at syntactic, logic, and semantic l...

ID: 2510.07489v1 cs.AI, cs.CL, cs.CY, cs.IR, cs.LG

arXiv PDF

📄 Multimodal Safety Evaluation in Generative Agent Social Simulations

2025-10-11

Авторы:

Alhim Vera, Karen Sanchez, Carlos Hinojosa, Haidar Bin Hamid, Donghoon Kim, Bernard Ghanem

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Can generative agents be trusted in multimodal environments? Despite advances in large language and vision-language models that enable agents to act autonomously and pursue goals in rich settings, their ability to reason about safety, coherence, and trust across modalities remains limited. We introduce a reproducible simulation framework for evaluating agents along three dimensions: (1) safety improvement over time, including iterative plan revisions in text-visual scenarios; (2) detection of un...

ID: 2510.07709v1 cs.AI, cs.CL, cs.CY, cs.MA

arXiv PDF

📄 VideoNorms: Benchmarking Cultural Awareness of Video Language Models

2025-10-11

Авторы:

Nikhil Reddy Varimalla, Yunfei Xu, Arkadiy Saakyan, Meng Fan Wang, Smaranda Muresan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As Video Large Language Models (VideoLLMs) are deployed globally, they require understanding of and grounding in the relevant cultural background. To properly assess these models' cultural awareness, adequate benchmarks are needed. We introduce VideoNorms, a benchmark of over 1000 (video clip, norm) pairs from US and Chinese cultures annotated with socio-cultural norms grounded in speech act theory, norm adherence and violations labels, and verbal and non-verbal evidence. To build VideoNorms, we...

ID: 2510.08543v1 cs.CV, cs.AI, cs.CL, cs.CY

arXiv PDF

📄 Machines in the Crowd? Measuring the Footprint of Machine-Generated Text on Reddit

2025-10-10

Авторы:

Lucio La Cava, Luca Maria Aiello, Andrea Tagarelli

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generative Artificial Intelligence is reshaping online communication by enabling large-scale production of Machine-Generated Text (MGT) at low cost. While its presence is rapidly growing across the Web, little is known about how MGT integrates into social media environments. In this paper, we present the first large-scale characterization of MGT on Reddit. Using a state-of-the-art statistical method for detection of MGT, we analyze over two years of activity (2022-2024) across 51 subreddits repr...

ID: 2510.07226v1 cs.SI, cs.CL, cs.CY, physics.soc-ph

arXiv PDF

Показано 61 - 70 из 137 записей