📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Yangyang Li
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Early detection of Alzheimer's Disease (AD) is greatly beneficial to AD
patients, leading to early treatments that lessen symptoms and alleviating
financial burden of health care. As one of the leading signs of AD, language
capability changes can be used for early diagnosis of AD. In this paper, I
develop a robust classification method using hybrid word embedding and
fine-tuned hyperparameters to achieve state-of-the-art accuracy in the early
detection of AD. Specifically, we create a hybrid wor...
📄 The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach
2025-10-14Авторы:
Nizar El Ghazal, Antoine Caubrière, Valentin Vielzeuf
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This paper presents a comparative study of context management strategies for
end-to-end Spoken Dialog State Tracking using Speech-LLMs. We systematically
evaluate traditional multimodal context (combining text history and spoken
current turn), full spoken history, and compressed spoken history approaches.
Our experiments on the SpokenWOZ corpus demonstrate that providing the full
spoken conversation as input yields the highest performance among models of
similar size, significantly surpassing pr...
Авторы:
Amruta Parulekar, Preethi Jyothi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Standard ASR evaluation metrics like Word Error Rate (WER) tend to unfairly
penalize morphological and syntactic nuances that do not significantly alter
sentence semantics. We introduce an LLM-based scoring rubric LASER that
leverages state-of-the-art LLMs' in-context learning abilities to learn from
prompts with detailed examples. Hindi LASER scores using Gemini 2.5 Pro
achieved a very high correlation score of 94% with human annotations. Hindi
examples in the prompt were also effective in anal...
Авторы:
Qinhao Zhou, Xiang Xiang, Kun He, John E. Hopcroft
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In recent years, the growing interest in Large Language Models (LLMs) has
significantly advanced prompt engineering, transitioning from manual design to
model-based optimization. Prompts for LLMs generally comprise two components:
the \textit{instruction}, which defines the task or objective, and the
\textit{input}, which is tailored to the instruction type. In natural language
generation (NLG) tasks such as machine translation, the \textit{input}
component is particularly critical, while the \t...
📄 Latent Speech-Text Transformer
2025-10-09Авторы:
Yen-Ju Lu, Yashesh Gaur, Wei Zhou, Benjamin Muller, Jesus Villalba, Najim Dehak, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Srinivasan Iyer, Duc Le
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Auto-regressive speech-text models are typically pre-trained on a large
number of interleaved sequences of text tokens and raw speech encoded as speech
tokens using vector quantization. These models have demonstrated
state-of-the-art performance in speech-to-speech understanding and generation
benchmarks, together with promising scaling laws, primarily enabled by the
representational alignment between text and speech. Nevertheless, they suffer
from shortcomings, partly owing to the disproportion...