📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 Decoding-Free Sampling Strategies for LLM Marginalization

2025-10-25

Авторы:

David Pohl, Marco Cognetta, Junyoung Lee, Naoaki Okazaki

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Modern language models operate on subword-tokenized text in order to make a trade-off between model size, inference speed, and vocabulary coverage. A side effect of this is that, during inference, models are evaluated by measuring the probability of only the specific tokenization produced as the output, despite there being many possible ways to represent the same text with a subword vocabulary. Recent studies have argued instead for evaluating LLMs by marginalization - the probability mass of al...

ID: 2510.20208v1 cs.CL, I.2.7

arXiv PDF

📄 Adapting Multilingual Models to Code-Mixed Tasks via Model Merging

2025-10-24

Авторы:

Prashant Kodali, Vaishnavi Shivkumar, Swarang Joshi, Monojit Choudhary, Ponnurangam Kumaraguru, Manish Shrivastava

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We study model merging as a practical alternative to conventional adaptation strategies for code-mixed NLP. Starting from a multilingual base model, we: (i) perform continued pre-training (CPT) on unlabeled code-mixed text to obtain an adapted checkpoint, (ii) merge checkpoint with the base model, and (iii) fine-tune (FT) on the downstream task data. We evaluate our approach for sentence classification (sentiment and hate speech) task in English-Hindi (En-Hi) and English-Spanish (En-Es) using XL...

ID: 2510.19782v2 cs.CL, I.2.7

arXiv PDF

📄 Qomhra: A Bilingual Irish-English Large Language Model

2025-10-22

Авторы:

Joseph McInerney

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper introduces Qomhr\'a, a bilingual Irish-English large language model (LLM), developed under low-resource constraints presenting a complete pipeline spanning bilingual continued pre-training, instruction tuning, and alignment from human preferences. Newly accessible Irish corpora and English text are mixed and curated to improve Irish performance while preserving English ability. 6 closed-weight LLMs are judged for their Irish text generation by a native speaker, a learner and other LLM...

ID: 2510.17652v1 cs.CL, I.2.7

arXiv PDF

📄 Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References?

2025-10-21

Авторы:

Ashutosh Bajpai, Tanmoy Chakraborty

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The increasing acceptance of large language models (LLMs) as an alternative to knowledge sources marks a significant paradigm shift across various domains, including time-sensitive fields such as law, healthcare, and finance. To fulfill this expanded role, LLMs must not only be factually accurate but also demonstrate consistency across temporal dimensions, necessitating robust temporal reasoning capabilities. Despite this critical requirement, efforts to ensure temporal consistency in LLMs remai...

ID: 2510.15513v1 cs.CL, I.2.7

arXiv PDF

📄 DROID: Dual Representation for Out-of-Scope Intent Detection

2025-10-18

Авторы:

Wael Rashwan, Hossam M. Zawbaa, Sourav Dutta, Haytham Assem

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Detecting out-of-scope (OOS) user utterances remains a key challenge in task-oriented dialogue systems and, more broadly, in open-set intent recognition. Existing approaches often depend on strong distributional assumptions or auxiliary calibration modules. We present DROID (Dual Representation for Out-of-Scope Intent Detection), a compact end-to-end framework that combines two complementary encoders -- the Universal Sentence Encoder (USE) for broad semantic generalization and a domain-adapted T...

ID: 2510.14110v1 cs.CL, I.2.7, I.5.1

arXiv PDF

📄 COSTAR-A: A prompting framework for enhancing Large Language Model performance on Point-of-View questions

2025-10-16

Авторы:

Nzubechukwu C. Ohalete, Kevin B. Gittner, Lauren M. Matheny

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models (LLMs) are highly sensitive to prompt design, and making optimized prompting techniques is crucial for generating consistent, high-quality outputs. In this study, we introduce COSTAR-A, a novel prompt engineering framework that enhances the existing COSTAR method, which stands for Context, Objective, Style, Tone, Audience, and Response, by adding the 'Answer' component at the end. We demonstrate that while the original COSTAR framework improves prompt clarity and aligns out...

ID: 2510.12637v1 cs.CL, I.2.7

arXiv PDF

📄 Augmenting Dialog with Think-Aloud Utterances for Modeling Individual Personality Traits by LLM

2025-10-14

Авторы:

Seiya Ishikura, Hiroaki Yamada, Tatsuya Hiraoka, Hiroaki Yamada, Takenobu Tokunaga

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This study proposes augmenting dialog data with think-aloud utterances (TAUs) for modeling individual personalities in text chat by LLM. TAU is a verbalization of a speaker's thought before articulating the utterance. We expect "persona LLMs" trained with TAU-augmented data can mimic the speaker's personality trait better. We tested whether the trained persona LLMs obtain the human personality with respect to Big Five, a framework characterizing human personality traits from five aspects. The re...

ID: 2510.09158v1 cs.CL, I.2.7

arXiv PDF

📄 Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion

2025-10-14

Авторы:

Ruitong Liu, Yan Wen, Te Sun, Yunjia Wu, Pingyang Huang, Zihang Yu, Siyuan Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Fusing Knowledge Graphs with Large Language Models is crucial for knowledge-intensive tasks like knowledge graph completion. The prevailing paradigm, prefix-tuning, simply concatenates knowledge embeddings with text inputs. However, this shallow fusion overlooks the rich relational semantics within KGs and imposes a significant implicit reasoning burden on the LLM to correlate the prefix with the text. To address these, we propose Semantic-condition Tuning (SCT), a new knowledge injection paradi...

ID: 2510.08966v1 cs.AI, cs.CL, I.2.7

arXiv PDF

📄 Comprehensiveness Metrics for Automatic Evaluation of Factual Recall in Text Generation

2025-10-11

Авторы:

Adam Dejl, James Barry, Alessandra Pascale, Javier Carnerero Cano

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Despite demonstrating remarkable performance across a wide range of tasks, large language models (LLMs) have also been found to frequently produce outputs that are incomplete or selectively omit key information. In sensitive domains, such omissions can result in significant harm comparable to that posed by factual inaccuracies, including hallucinations. In this study, we address the challenge of evaluating the comprehensiveness of LLM-generated texts, focusing on the detection of missing informa...

ID: 2510.07926v1 cs.CL, I.2.7

arXiv PDF

📄 Test-Time Scaling of Reasoning Models for Machine Translation

2025-10-10

Авторы:

Zihao Li, Shaoxiong Ji, Jörg Tiedemann

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Test-time scaling (TTS) has enhanced the performance of Reasoning Models (RMs) on various tasks such as math and coding, yet its efficacy in machine translation (MT) remains underexplored. This paper investigates whether increased inference-time computation improves translation quality. We evaluate 12 RMs across a diverse suite of MT benchmarks spanning multiple domains, examining three scenarios: direct translation, forced-reasoning extrapolation, and post-editing. Our findings show that for ge...

ID: 2510.06471v1 cs.CL, I.2.7

arXiv PDF

Показано 11 - 20 из 63 записей