Machine learning methods in personalized medicine: application to genomic data in Alzheimer's disease

Ballestà López, Mireia; Ballestà López, Mireia

Data de publicació

2018-09

URI http://hdl.handle.net/10854/5729

Resum

The main goal of this project is to validate and compare machine learning methods to perform GWAS analysis. This study worked with genomic data on Alzheimer’s disease (AD). The data obtained was imputed by the Michigan Imputation Server and pre-processed by a quality control at both SNPs and individual’s level. In order to reduce the dimensionality, SNPs were filtered using different Linkage-Disequilibrium (LD) thresholds (0.2, 0.4 and 0.6). Filtered data was then analysed by five machine learning statistical methods: logistic regression, random forest, k-nearest neighbours, Gradient Boosting Machine and, deep neural networks. The model performance were compared using AUC, sensitivity, specificity and F-measure to evaluate the predictive capacity or reliability of the models. In addition, best models were validated using KEGG pathways. Our conclusion is that best results are obtained when applying a LD threshold of 0.2. From all the five algorithms performed, GBM with a LD threshold 0.2 was seen to be the best model to predict AD based on AUC, sensitivity, specificity, F-measure and validating the results with KEGG pathways.

Tipus de document

Treball fi de màster

Versió del document

Supervisor/a: Juan Ramón González

Director/a: Josep M. Serrat

Llengua

Anglès

Paraules clau

Aprenentatge automàtic

Alzheimer, Malaltia d'

Pàgines

39 p.

Nota

Curs 2017-2018

Citació recomanada

Aquesta citació s'ha generat automàticament.

Mostra el registre complet de l'element

Aquest element apareix en la col·lecció o col·leccions següent(s)

Màster Universitari en Anàlisi de Dades Òmiques [89]

Drets

Tots els drets reservats