📊 Статистика дайджестов

Всего дайджестов: 34123 Добавлено сегодня: 101

Последнее обновление: сегодня

📄 Contrastive Deep Learning for Variant Detection in Wastewater Genomic Sequencing

2025-12-04

Авторы:

Adele Chinda, Richmond Azumah, Hemanth Demakethepalli Venkateswara

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Wastewater-based genomic surveillance has emerged as a powerful tool for population-level viral monitoring, offering comprehensive insights into circulating viral variants across entire communities. However, this approach faces significant computational challenges stemming from high sequencing noise, low viral coverage, fragmented reads, and the complete absence of labeled variant annotations. Traditional reference-based variant calling pipelines struggle with novel mutations and require extensi...

ID: 2512.03158v1 cs.LG, q-bio.GN

arXiv PDF

📄 MoRE: Batch-Robust Multi-Omics Representations from Frozen Pre-trained Transformers

2025-11-27

Авторы:

Audrey Pei-Hsuan Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Representation learning on multi-omics data is challenging due to extreme dimensionality, modality heterogeneity, and cohort-specific batch effects. While pre-trained transformer backbones have shown broad generalization capabilities in biological sequence modeling, their application to multi-omics integration remains underexplored. We present MoRE (Multi-Omics Representation Embedding), a framework that repurposes frozen pre-trained transformers to align heterogeneous assays into a shared laten...

ID: 2511.20382v2 cs.LG, q-bio.GN

arXiv PDF

📄 MoRE: Batch-Robust Multi-Omics Representations from Frozen Pre-trained Transformers

2025-11-27

Авторы:

Audrey Pei-Hsuan Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

ID: 2511.20382v1 cs.LG, q-bio.GN

arXiv PDF

📄 A Hybrid Computational Intelligence Framework for scRNA-seq Imputation: Integrating scRecover and Random Forests

2025-11-25

Авторы:

Ali Anaissi, Deshao Liu, Yuanzhe Jia, Weidong Huang, Widad Alyassine, Junaid Akram

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Single-cell RNA sequencing (scRNA-seq) enables transcriptomic profiling at cellular resolution but suffers from pervasive dropout events that obscure biological signals. We present SCR-MF, a modular two-stage workflow that combines principled dropout detection using scRecover with robust non-parametric imputation via missForest. Across public and simulated datasets, SCR-MF achieves robust and interpretable performance comparable to or exceeding existing imputation methods in most cases, while pr...

ID: 2511.16923v1 cs.LG, q-bio.GN

arXiv PDF

📄 Rare Genomic Subtype Discovery from RNA-seq via Autoencoder Embeddings and Stability-Aware Clustering

2025-11-19

Авторы:

Alaa Mezghiche

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Unsupervised learning on high-dimensional RNA-seq data can reveal molecular subtypes beyond standard labels. We combine an autoencoder-based representation with clustering and stability analysis to search for rare but reproducible genomic subtypes. On the UCI "Gene Expression Cancer RNA-Seq" dataset (801 samples, 20,531 genes; BRCA, COAD, KIRC, LUAD, PRAD), a pan-cancer analysis shows clusters aligning almost perfectly with tissue of origin (Cramer's V = 0.887), serving as a negative control. We...

ID: 2511.13705v1 cs.LG, q-bio.GN

arXiv PDF

📄 Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

2025-11-08

Авторы:

Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, James D. Pearce, Donghui Li, Aly A. Khan, Theofanis Karaletsos, Jakub M. Tomczak

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression data and complex latent dependencies among genes. Existing generative models often impose artificial gene orderings or rely on shallow neural network architectures. We introduce a scalable latent diffusion model for single-cell gene expression data, which we re...

ID: 2511.02986v1 stat.ML, cs.LG, q-bio.GN

arXiv PDF

📄 Hierarchical Bayesian Model for Gene Deconvolution and Functional Analysis in Human Endometrium Across the Menstrual Cycle

2025-11-04

Авторы:

Crystal Su, Kuai Yu, Mingyuan Shao, Daniel Bauer

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Bulk tissue RNA sequencing of heterogeneous samples provides averaged gene expression profiles, obscuring cell type-specific dynamics. To address this, we present a probabilistic hierarchical Bayesian model that deconvolves bulk RNA-seq data into constituent cell-type expression profiles and proportions, leveraging a high-resolution single-cell reference. We apply our model to human endometrial tissue across the menstrual cycle, a context characterized by dramatic hormone-driven cellular composi...

ID: 2510.27097v1 cs.LG, q-bio.GN

arXiv PDF

📄 scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration

2025-11-01

Авторы:

Jianle Sun, Chaoqi Liang, Ran Wei, Peng Zheng, Lei Bai, Wanli Ouyang, Hongliang Yan, Peng Ye

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Advances in single-cell sequencing have enabled high-resolution profiling of diverse molecular modalities, while integrating unpaired multi-omics single-cell data remains challenging. Existing approaches either rely on pair information or prior correspondences, or require computing a global pairwise coupling matrix, limiting their scalability and flexibility. In this paper, we introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Represent...

ID: 2510.24987v1 q-bio.QM, cs.LG, q-bio.GN

arXiv PDF

📄 HyperHELM: Hyperbolic Hierarchy Encoding for mRNA Language Modeling

2025-10-01

Авторы:

Max van Spengler, Artem Moskalev, Tommaso Mansi, Mangal Prakash, Rui Liao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Language models are increasingly applied to biological sequences like proteins and mRNA, yet their default Euclidean geometry may mismatch the hierarchical structures inherent to biological data. While hyperbolic geometry provides a better alternative for accommodating hierarchical data, it has yet to find a way into language modeling for mRNA sequences. In this work, we introduce HyperHELM, a framework that implements masked language model pre-training in hyperbolic space for mRNA sequences. Us...

ID: 2509.24655v1 cs.LG, q-bio.GN

arXiv PDF

📄 Reverse-Complement Consistency for DNA Language Models

2025-09-25

Авторы:

Mingqian Ma

## Контекст Одна из основных свойств DNA — то, что последовательность и её обратный комплемент (Reverse Complement, RC) часто несут одинаковое биологическое значение. Однако современные DNA языковые модели часто не учитывают эту симметрию, что приводит к несогласованным прогнозам для последовательности и её RC-ансамбля. Это снижает надежность и эффективность таких моделей в задачах, связанных с геномическими данными. Наша мотивация заключается в создании моделей, которые будут учитывать эту симметрию и обеспечивать точные и надежные результаты в различных геномных задачах. ## Метод Мы предлагаем Reverse-Complement Consistency Regularization (RCCR) — простой, многозадачный и модельно-агностичный подход к оптимизации DNA языковых моделей. Основная идея заключается в том, чтобы добавить функцию потерь, которая оценивает разность между прогнозами модели для последовательности и её RC-соответствия. Это решение интегрирует ключевую биологическую природу DNA непосредственно в процесс обучения модели, что позволяет повысить её точность и устойчивость к ошибкам при работе с RC-соответствиями. ## Результаты Мы проверили RCCR на трех моделях DNA языковых моделей: Nucleotide Transformer, HyenaDNA и DNABERT-2. Эти модели были тестируются на широком спектре геномных задач, включая последовательностные классификации, регрессионные задачи и профильные предсказания. Результаты показали, что RCCR существенно сокращает количество несогласованных прогнозов (RC-переключений) и уменьшает ошибки, при этом сохраняя или даже улучшая общую точность модели по сравнению с такими методами, как RC-дата-аугментация и тест-тайм-авгментация. ## Значимость Предложенный подход имеет широкое применение в области геномики и биоинформатики. Он позволяет создавать модели, которые более точно отражают биологические свойства DNA. Преимущество RCCR в том, что он обеспечивает высокую надежность и эффективность, не требуя значительных дополнительных ресурсов для обучения. Это делает его привлекательным для применения в различных геномных задачах, включая геномный анализ, разработку новых лекарств и транскриптоманикские исследования. ## Выводы Мы продемонстрировали, что RCCR значительно улучшает устойчивость моделей к несогласованности RC-прогнозов, при этом сохраняя высокую точность задач. Этот подход может стать основой для развития биологически-ориентированных глубоких моделей в будущем. Мы планируем провести дальнейшие исследования по оптимизации RCCR и его применению в сложных геномных задачах.

Annotation:

A fundamental property of DNA is that the reverse complement (RC) of a sequence often carries identical biological meaning. However, state-of-the-art DNA language models frequently fail to capture this symmetry, producing inconsistent predictions for a sequence and its RC counterpart, which undermines their reliability. In this work, we introduce Reverse-Complement Consistency Regularization (RCCR), a simple and model-agnostic fine-tuning objective that directly penalizes the divergence between ...

ID: 2509.18529v1 cs.LG, q-bio.GN

arXiv PDF

Показано 1 - 10 из 15 записей