Mostrar el registro sencillo del ítem
Bioinformatic tools for Big Data in Omic studies with application to genomic inversion calling and multiomic data integration
dc.contributor | Universitat de Vic - Universitat Central de Catalunya. Facultat de Ciències i Tecnologia | |
dc.contributor | Universitat de Vic - Universitat Central de Catalunya. Màster Universitari en Anàlisi de Dades Òmiques | |
dc.contributor.author | Pelegrí Sisó, M. Dolors | |
dc.date.accessioned | 2021-01-08T17:48:13Z | |
dc.date.available | 2021-01-08T17:48:13Z | |
dc.date.created | 2020-09 | |
dc.date.issued | 2020-09 | |
dc.identifier.uri | http://hdl.handle.net/10854/6415 | |
dc.description | Curs 2019-2020 | es |
dc.description.abstract | Motivation: The diversity and huge omics data take biology and biomedicine research and application into a big data era. Most of the current statistical analyses required to analyze omic data are not designed to deal with big data. Principal component analyses and multivariate methods to integrate multi-omic data are one of those examples. Therefore, having efficient and scalable functions are required to exploit the large amount of omic data which is currently available. Results: We developed a library called BigDataStatMeth which includes functions to perform basic matrix operations and linear algebra for big matrices using HDF5 and DelayedArray Bioconductor’s infrastructure. We tested its performance by comparing the computational time with the one obtained with R base functions. Our results showed that our implementation outperforms existing functions and that the improvement increases when sample size is also increasing. This package can be the basis for implementing statistical methods required in omic data with large number of samples or features. As a proof-of-concept, we implemented PCA and Lasso regression within the same package and we also created another Bioconductor package, mgcca, which implements Generalized Canonical Correlation Analysis (GCCA) that is used in multi-omic data integration. We implemented an algorithm that allows the possibility of having missing individuals in one or more tables. The implemented methods have been used to analyze real omic data. We first used PCA to call genotype inversions of more than 400K individuals from UKBiobank. Then, data from TCGA was used to integrate multiple omic layers using GCCA. | es |
dc.format | application/pdf | es |
dc.format.extent | 11 p. | es |
dc.language.iso | eng | es |
dc.rights | Tots els drets reservats | es |
dc.subject.other | Bioinformàtica | es |
dc.subject.other | Dades massives | es |
dc.title | Bioinformatic tools for Big Data in Omic studies with application to genomic inversion calling and multiomic data integration | es |
dc.type | info:eu-repo/semantics/masterThesis | es |
dc.description.version | Director/a: Calle Rosingana, M. Luz | |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |