Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of a single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider the parallel rendering algorithms. In this paper, we thoroughly investigate the NUMA-aware image compositing problem, which is the key final stage in sort-last parallel rendering. Based on a proven radix-k strategy, we find one optimal compositing algorithm, which takes advantage of NUMA architecture on the multi-GPU platform. We quantitatively analyze different image ...
Journal ArticleExisting volume rendering methods, though capable of very effective visualizations, a...
International audienceParallel image compositing has been widely studied over the past 20 years, as ...
GPUs achieve high throughput and power efficiency by employing many small single instruction multipl...
Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with mu...
Parallel volume rendering offers a feasible solution to the large data visualization problem by dist...
The image compositing stages in cluster-parallel rendering for gathering and combining partial rende...
Collective communication operations can dominate the cost of large-scale parallel algorithms. Image ...
The only proven method for performing distributed-memory parallel rendering at large scales, tens of...
Achieving efficient scalable parallel rendering for interactive visualization applications on medium...
[[abstract]]© 2004 Institute of Information Science Academia Sinica - The binary-swap (BS) and the p...
Achieving efficient scalable parallel rendering for interactive visualization applications on medium-...
State of the art scientific simulations are currently working with data set sizes on the order of a ...
Collective communication operations can dominate the cost of large-scale parallel algorithms. Image ...
[[abstract]]In the sort-last-sparse parallel volume rendering system on distributed memory multicomp...
In the sort-last-sparse parallel volume rendering system on distributed memory multicomputers, as th...
Journal ArticleExisting volume rendering methods, though capable of very effective visualizations, a...
International audienceParallel image compositing has been widely studied over the past 20 years, as ...
GPUs achieve high throughput and power efficiency by employing many small single instruction multipl...
Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with mu...
Parallel volume rendering offers a feasible solution to the large data visualization problem by dist...
The image compositing stages in cluster-parallel rendering for gathering and combining partial rende...
Collective communication operations can dominate the cost of large-scale parallel algorithms. Image ...
The only proven method for performing distributed-memory parallel rendering at large scales, tens of...
Achieving efficient scalable parallel rendering for interactive visualization applications on medium...
[[abstract]]© 2004 Institute of Information Science Academia Sinica - The binary-swap (BS) and the p...
Achieving efficient scalable parallel rendering for interactive visualization applications on medium-...
State of the art scientific simulations are currently working with data set sizes on the order of a ...
Collective communication operations can dominate the cost of large-scale parallel algorithms. Image ...
[[abstract]]In the sort-last-sparse parallel volume rendering system on distributed memory multicomp...
In the sort-last-sparse parallel volume rendering system on distributed memory multicomputers, as th...
Journal ArticleExisting volume rendering methods, though capable of very effective visualizations, a...
International audienceParallel image compositing has been widely studied over the past 20 years, as ...
GPUs achieve high throughput and power efficiency by employing many small single instruction multipl...