Show simple item record

dc.contributorUniversitat de Vic - Universitat Central de Catalunya. Facultat de Ciències i Tecnologia
dc.contributorUniversitat de Vic - Universitat Central de Catalunya. Màster Universitari en Anàlisi de Dades Òmiques
dc.contributor.authorSquitieri, Alessia
dc.date.accessioned2021-01-08T16:01:17Z
dc.date.available2021-01-08T16:01:17Z
dc.date.created2020-05-15
dc.date.issued2020-05-15
dc.identifier.urihttp://hdl.handle.net/10854/6410
dc.descriptionCurs 2019-2020es
dc.description.abstractMetagenomics is a pioneering branch of bioinformatics that utilizes genomics techniques, like the sequencing of the DNA, in order to obtain important information about microorganisms. During the recent years, scientists strongly focused on this innovative field, highlighting its importance in the clinical area, as well as in the environmental one. In this respect, the lack of user – friendly software that allow metagenomes’ analysis has become an important issue. GAIA is a bioinformatics tool, developed by Sequentia Biotech, that is aimed to perform functional and taxonomical analyses of metagenomics data from both amplicon and whole genome sequencing data. As well as other software, GAIA has the ability to analyze data at strain level. However, one limitation of GAIA is the high number of false positives that can arise during this type of analysis. This is due to the high similarity existing between genomes of microorganisms from different strains of the same species. From this perspective, we worked on GAIA’s ability to taxonomically classify bacterial strains from their sequences. We benchmarked different machine learning classification models. Moreover, we had to handle the imbalanced data problem, a common machine learning issue, testing different methods and comparing them to each other. We finally find the best model using hyperparameters tuning technique. The results we obtained show a significant improvement in the accuracy of GAIA’s predictions.es
dc.formatapplication/pdfes
dc.format.extent32 p.es
dc.language.isoenges
dc.rightsTots els drets reservatses
dc.subject.otherGenòmicaes
dc.subject.otherAlgorismes genèticses
dc.subject.otherBioinformàticaes
dc.titleDevelopment of a machine learning algorithm classification tool to improve strain detection in whole genome metagenomics datasetes
dc.typeinfo:eu-repo/semantics/masterThesises
dc.description.versionDirector/a: Serrat Jurado, Josep Maria
dc.rights.accessRightsinfo:eu-repo/semantics/closedAccesses


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share on TwitterShare on LinkedinShare on FacebookShare on TelegramShare on WhatsappPrint