Utilize este identificador para referenciar este registo:
https://hdl.handle.net/1822/90084
Título: | Scalable transcriptomics analysis with dask: applications in data science and machine learning |
Autor(es): | Moreno, Marta Vilaça, Ricardo Manuel Pereira Ferreira, Pedro G. |
Palavras-chave: | Data Science Python Dask Transcriptomics analysis Machine learning Scalable data science Gene expression Transcriptomics Data analysis |
Data: | 30-Nov-2022 |
Editora: | BMC |
Revista: | BMC Bioinformatics |
Resumo(s): | Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profles helps derive signatures for the prediction, diagnosis and prognosis of diferent diseases. Data science and specifcally machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefts of the Dask framework and how it can be integrated with the Python scientifc environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in diferent case studies. Detailed documentation and code on these procedures is made available at https://github.com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures. |
Tipo: | Artigo |
URI: | https://hdl.handle.net/1822/90084 |
DOI: | 10.1186/s12859-022-05065-3 |
ISSN: | 1471-2105 |
Versão da editora: | https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-05065-3 |
Arbitragem científica: | yes |
Acesso: | Acesso aberto |
Aparece nas coleções: | HASLab - Artigos em revistas internacionais |