Show simple item record

dc.contributorUniversitat de Vic - Universitat Central de Catalunya. Facultat de Ciències i Tecnologia
dc.contributorUniversitat de Vic - Universitat Central de Catalunya. Màster Universitari en Anàlisi de Dades Òmiques
dc.contributor.authorPelegrí Sisó, M. Dolors
dc.date.accessioned2021-01-08T17:48:13Z
dc.date.available2021-01-08T17:48:13Z
dc.date.created2020-09
dc.date.issued2020-09
dc.identifier.urihttp://hdl.handle.net/10854/6415
dc.descriptionCurs 2019-2020es
dc.description.abstractMotivation: The diversity and huge omics data take biology and biomedicine research and application into a big data era. Most of the current statistical analyses required to analyze omic data are not designed to deal with big data. Principal component analyses and multivariate methods to integrate multi-omic data are one of those examples. Therefore, having efficient and scalable functions are required to exploit the large amount of omic data which is currently available. Results: We developed a library called BigDataStatMeth which includes functions to perform basic matrix operations and linear algebra for big matrices using HDF5 and DelayedArray Bioconductor’s infrastructure. We tested its performance by comparing the computational time with the one obtained with R base functions. Our results showed that our implementation outperforms existing functions and that the improvement increases when sample size is also increasing. This package can be the basis for implementing statistical methods required in omic data with large number of samples or features. As a proof-of-concept, we implemented PCA and Lasso regression within the same package and we also created another Bioconductor package, mgcca, which implements Generalized Canonical Correlation Analysis (GCCA) that is used in multi-omic data integration. We implemented an algorithm that allows the possibility of having missing individuals in one or more tables. The implemented methods have been used to analyze real omic data. We first used PCA to call genotype inversions of more than 400K individuals from UKBiobank. Then, data from TCGA was used to integrate multiple omic layers using GCCA.es
dc.formatapplication/pdfes
dc.format.extent11 p.es
dc.language.isoenges
dc.rightsTots els drets reservatses
dc.subject.otherBioinformàticaes
dc.subject.otherDades massiveses
dc.titleBioinformatic tools for Big Data in Omic studies with application to genomic inversion calling and multiomic data integrationes
dc.typeinfo:eu-repo/semantics/masterThesises
dc.description.versionDirector/a: Calle Rosingana, M. Luz
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses


Files in this item

 

This item appears in the following Collection(s)

Show simple item record

Share on TwitterShare on LinkedinShare on FacebookShare on TelegramShare on WhatsappPrint