The Genomic Data Commons (GDC) is a data platform for managing, processing, analyzing, and sharing cancer genomics data. The data processing component of the GDC is called the GDC Pipeline Automation System (GPAS). GPAS currently uses an on-premises cluster that uses virtual machines (VMs) and bare metal machines to run multiple bioinformatics pipelines. The GPAS has been used in production for over two years and valuable pipeline statistics are scattered in multiple databases across the platform. This dissertation presents a platform-wide statistics collecting service for the GPAS, and based the synthesized statistics, several performance issues have been identified and investigated. The first performance issue examined is that jobs on ...
Motivation: High performance computing (HPC) clusters play a pivotal role in large-scale bioinformat...
The growing DNA data volumes that originate from novel efficient DNA sequencing methods expose new ch...
We live in a big-data era and simple serial bioinformatic pipelines can’t efficiently handle huge in...
The development of Next Generation Sequencing (NGS) technology resulted the rapid accumulation of a ...
A Genome Wide Association Study (GWAS) is an important bioinformatics method to associate variants w...
Life scientists are increasingly using whole genome sequencing (WGS) to ask and answer research ques...
Since the last decade, the main components of computer systems have been evolving, diversifying, to ...
Motivation: The growth of next-generation sequencing (NGS) has not only dramatically accelerated the...
The recent upsurge in the available amount of health data and the advances in next-generation sequen...
AbstractNowadays, High Performance Computing (HPC) systems commonly used in bioinformatics, such as ...
The changing landscape of genomics research and clinical practice has created a need for computation...
Massive exploitation of next-generation sequencing technologies requires dealing with both: huge amo...
Abstract — Heterogeneous administrative domains, operating sys-tems (OS), and libraries make it diff...
Next generation sequencing (NGS) has been a great success and is now a standard meth-od of research ...
Massive exploitation of next-generation sequencing technologies requires dealing with both: huge amo...
Motivation: High performance computing (HPC) clusters play a pivotal role in large-scale bioinformat...
The growing DNA data volumes that originate from novel efficient DNA sequencing methods expose new ch...
We live in a big-data era and simple serial bioinformatic pipelines can’t efficiently handle huge in...
The development of Next Generation Sequencing (NGS) technology resulted the rapid accumulation of a ...
A Genome Wide Association Study (GWAS) is an important bioinformatics method to associate variants w...
Life scientists are increasingly using whole genome sequencing (WGS) to ask and answer research ques...
Since the last decade, the main components of computer systems have been evolving, diversifying, to ...
Motivation: The growth of next-generation sequencing (NGS) has not only dramatically accelerated the...
The recent upsurge in the available amount of health data and the advances in next-generation sequen...
AbstractNowadays, High Performance Computing (HPC) systems commonly used in bioinformatics, such as ...
The changing landscape of genomics research and clinical practice has created a need for computation...
Massive exploitation of next-generation sequencing technologies requires dealing with both: huge amo...
Abstract — Heterogeneous administrative domains, operating sys-tems (OS), and libraries make it diff...
Next generation sequencing (NGS) has been a great success and is now a standard meth-od of research ...
Massive exploitation of next-generation sequencing technologies requires dealing with both: huge amo...
Motivation: High performance computing (HPC) clusters play a pivotal role in large-scale bioinformat...
The growing DNA data volumes that originate from novel efficient DNA sequencing methods expose new ch...
We live in a big-data era and simple serial bioinformatic pipelines can’t efficiently handle huge in...