Prediction of tumor patterns through the integration of clinical and transcriptomics data
Fecha de publicación
2025-09-09Resumen
Cancer remains a leading cause of mortality worldwide, largely due to its remarkable
heterogeneity and the lack of robust molecular classifiers that can capture the complexity
of tumor biology beyond histopathological criteria. Despite major advances in molecular
oncology, most transcriptomic studies have focused primarily on global gene expression,
overlooking other regulatory layers such as promoter activity, alternative splicing, or
tumor microenvironment composition. This perspective limits the discovery of
biomarkers and constrains the development of predictive tools for precision oncology.
Here, we present tumorProfiler, a modular analytical framework that integrates multiple
transcriptomic dimensions, including promoter activity, alternative splicing (Percent
Spliced-In, PSI) and gene expression, into predictive models for tumor classification.
Using high-quality RNA-seq data from the Pan-Cancer Analysis of Whole Genomes
(PCAWG) cohort (n = 305 donors), we systematically characterized differential promoter
activity, exon inclusion patterns, gene deregulation, and immune–stromal profiles across
ten tumor types and intra-organ progression subtypes. Six supervised learning models
were constructed, combining interpretable machine learning algorithms such as Random
Forest with automated frameworks for benchmarking. Our results reveal that promoter
activity and gene expression consistently outperform splicing events and cell
composition in multiclass tumor prediction, achieving high accuracy and generalization
capacity (AUC > 0.95; OOB error < 7%). While splicing events-based models captured
biologically meaningful variation, their predictive power was more limited. Importantly,
variable importance analyses highlighted a reduced subset of promoters, splicing events,
and genes as candidate biomarkers with potential translational relevance. Altogether,
this work demonstrates that transcriptomic regulation in cancer operates through
complementary layers of molecular information, each contributing differently to tumor
identity and progression. By integrating these layers, tumorProfiler provides a flexible
and interpretable platform for patient stratification, biomarker discovery, and the design
of precision therapies. Although currently a computational proof of concept, its modular
design and discovery potential open avenues for experimental validation and future
clinical translation.
Tipo de documento
Trabajo fin de máster
Versión del documento
Versión publicada
Lengua
Inglés
Palabras clave
Páginas
32 p.
Publicado por
Universitat de Vic - Universitat Central de Catalunya
Nota
Curs 2024-2025
Pujolassos Tanyà, Meritxell
Citación recomendada
Esta citación se ha generado automáticamente.
Este ítem aparece en la(s) siguiente(s) colección(ones)
Excepto si se señala otra cosa, la licencia del ítem se describe como http://creativecommons.org/licenses/by-nc-nd/4.0/

