Variant calling is a classic example of DNA data analysis, in which variants from DNA sequence data are identified. In recent years, a number of frameworks have been developed for implementing the variant calling pipeline. As a result, deeper scientific insights have been gained into genomics. From the computational point of view, however, these frameworks suffer from either excessive serialization and deserialization or lack of scalability. This work proposes using Apache Arrow as the unified in-memory data format and employing its associated communication protocol Flight to enable horizontal scalability for the sorting stage of a variant calling pipeline. Compared to other protocols, Arrow Flight features high throughput, low overhead, an...
A revolution in personalized genomics will occur when scientists can sequence genomes of millions of...
Background: New high-throughput technologies, such as massively parallel sequencing, have transforme...
With the cost of sequencing a human genome dropping below $1,000, population-scale sequencing has be...
The ever increasing pace of advancements in sequencing technologies has enabled rapid DNA/genome seq...
The rapidly growing size of genomics data bases, driven by advances in sequencing technologies, dema...
Background: Immense improvements in sequencing technologies enable producing large amounts of high t...
Background Recently many new deep learning–based variant-calling methods like DeepVariant have emerg...
The growing DNA data volumes that originate from novel efficient DNA sequencing methods expose new ch...
Genomics and Next Generation Sequencers (NGS) like Illumina Hiseq produce data in the order of 200 b...
Background The accurate detection of somatic variants from sequencing data is of key importance for ...
Due to the rapid decrease in the cost of NGS (Next Generation Sequencing), interest has increased in...
Genomic repeats, i.e., pattern searching in the string processing process to find repeated base pair...
In the past few years, considerable attention has been paid to reduce the computational time for the...
Abstract Background The advance of next generation sequencing enables higher throughput with lower p...
DNA carries all the information needed to define the genetic structure of an individual. The large a...
A revolution in personalized genomics will occur when scientists can sequence genomes of millions of...
Background: New high-throughput technologies, such as massively parallel sequencing, have transforme...
With the cost of sequencing a human genome dropping below $1,000, population-scale sequencing has be...
The ever increasing pace of advancements in sequencing technologies has enabled rapid DNA/genome seq...
The rapidly growing size of genomics data bases, driven by advances in sequencing technologies, dema...
Background: Immense improvements in sequencing technologies enable producing large amounts of high t...
Background Recently many new deep learning–based variant-calling methods like DeepVariant have emerg...
The growing DNA data volumes that originate from novel efficient DNA sequencing methods expose new ch...
Genomics and Next Generation Sequencers (NGS) like Illumina Hiseq produce data in the order of 200 b...
Background The accurate detection of somatic variants from sequencing data is of key importance for ...
Due to the rapid decrease in the cost of NGS (Next Generation Sequencing), interest has increased in...
Genomic repeats, i.e., pattern searching in the string processing process to find repeated base pair...
In the past few years, considerable attention has been paid to reduce the computational time for the...
Abstract Background The advance of next generation sequencing enables higher throughput with lower p...
DNA carries all the information needed to define the genetic structure of an individual. The large a...
A revolution in personalized genomics will occur when scientists can sequence genomes of millions of...
Background: New high-throughput technologies, such as massively parallel sequencing, have transforme...
With the cost of sequencing a human genome dropping below $1,000, population-scale sequencing has be...