Bioinformatic tools for Big Data in Omic studies with application to genomic inversion calling and multiomic data integration

Pelegrí Sisó, M. Dolors

dc.contributor	Universitat de Vic - Universitat Central de Catalunya. Facultat de Ciències i Tecnologia
dc.contributor	Universitat de Vic - Universitat Central de Catalunya. Màster Universitari en Anàlisi de Dades Òmiques
dc.contributor.author	Pelegrí Sisó, M. Dolors
dc.date.accessioned	2021-01-08T17:48:13Z
dc.date.available	2021-01-08T17:48:13Z
dc.date.created	2020-09
dc.date.issued	2020-09
dc.identifier.uri	http://hdl.handle.net/10854/6415
dc.description	Curs 2019-2020	es
dc.description.abstract	Motivation: The diversity and huge omics data take biology and biomedicine research and application into a big data era. Most of the current statistical analyses required to analyze omic data are not designed to deal with big data. Principal component analyses and multivariate methods to integrate multi-omic data are one of those examples. Therefore, having efficient and scalable functions are required to exploit the large amount of omic data which is currently available. Results: We developed a library called BigDataStatMeth which includes functions to perform basic matrix operations and linear algebra for big matrices using HDF5 and DelayedArray Bioconductor’s infrastructure. We tested its performance by comparing the computational time with the one obtained with R base functions. Our results showed that our implementation outperforms existing functions and that the improvement increases when sample size is also increasing. This package can be the basis for implementing statistical methods required in omic data with large number of samples or features. As a proof-of-concept, we implemented PCA and Lasso regression within the same package and we also created another Bioconductor package, mgcca, which implements Generalized Canonical Correlation Analysis (GCCA) that is used in multi-omic data integration. We implemented an algorithm that allows the possibility of having missing individuals in one or more tables. The implemented methods have been used to analyze real omic data. We first used PCA to call genotype inversions of more than 400K individuals from UKBiobank. Then, data from TCGA was used to integrate multiple omic layers using GCCA.	es
dc.format	application/pdf	es
dc.format.extent	11 p.	es
dc.language.iso	eng	es
dc.rights	Tots els drets reservats	es
dc.subject.other	Bioinformàtica	es
dc.subject.other	Dades massives	es
dc.title	Bioinformatic tools for Big Data in Omic studies with application to genomic inversion calling and multiomic data integration	es
dc.type	info:eu-repo/semantics/masterThesis	es
dc.description.version	Director/a: Calle Rosingana, M. Luz
dc.rights.accessLevel	info:eu-repo/semantics/openAccess

Files in this item

Name:: trealu_a2020_pelegri_mariadolo ...
Size:: 6.091Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Màster Universitari en Anàlisi de Dades Òmiques [89]

Show simple item record

Bioinformatic tools for Big Data in Omic studies with application to genomic inversion calling and multiomic data integration

Files in this item

This item appears in the following Collection(s)

Browse

My Account