This whitepaper investigates the parallel performance of a sample application that implements an approximate expectation-maximization method for inferring the network structure and time varying states of a hidden population within the framework of the kinetic Ising model. The size of networks that can yield informative results can be made arbitrarily large, and the long-running computational demand is highly localized, making the application a strong candidate for future exascale platforms. Previous investigations using OpenMP on the Intel Xeon Phi architecture have suggested that the class of accelerator unit may play a significant part in attainable application performance. An OpenCL parallelization enables experiments with a variety of a...
Consistently growing architectural complexity and machine scales make creating accurate performance ...
Parallel programming is vital to fully utilize the multicore architectures that dominate the process...
This work describes my solution to the performance portability problem: between CPUs and GPUs in par...
This whitepaper investigates the parallel performance of a sample application that implements an app...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
Accelerator processors allow energy-efficient computation at high performance, especially for comput...
The internal structure of interactions in a hidden network can be inferred using a maximum likelihoo...
The size of data that can be fitted with a statistical model becomes restrictive when accounting for...
This paper investigates the development of a molecular dynamics code that is highly portable between...
The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architect...
The next-generation of supercomputers will feature a diverse mix of accelerator devices. The increas...
The two main thrusts of computational science are increasingly accurate predictions and faster calcu...
The article discusses possibilities of implementing a neural network in a parallel way. The issues o...
In-vivo and in-vitro experiments are routinely used in neuroscience to unravel brain functionality. ...
Recent technological advances have proliferated the available computing power, memory, and speed of ...
Consistently growing architectural complexity and machine scales make creating accurate performance ...
Parallel programming is vital to fully utilize the multicore architectures that dominate the process...
This work describes my solution to the performance portability problem: between CPUs and GPUs in par...
This whitepaper investigates the parallel performance of a sample application that implements an app...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
Accelerator processors allow energy-efficient computation at high performance, especially for comput...
The internal structure of interactions in a hidden network can be inferred using a maximum likelihoo...
The size of data that can be fitted with a statistical model becomes restrictive when accounting for...
This paper investigates the development of a molecular dynamics code that is highly portable between...
The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architect...
The next-generation of supercomputers will feature a diverse mix of accelerator devices. The increas...
The two main thrusts of computational science are increasingly accurate predictions and faster calcu...
The article discusses possibilities of implementing a neural network in a parallel way. The issues o...
In-vivo and in-vitro experiments are routinely used in neuroscience to unravel brain functionality. ...
Recent technological advances have proliferated the available computing power, memory, and speed of ...
Consistently growing architectural complexity and machine scales make creating accurate performance ...
Parallel programming is vital to fully utilize the multicore architectures that dominate the process...
This work describes my solution to the performance portability problem: between CPUs and GPUs in par...