The bulk synchronous parallel (BSP) model promises scalable and portable software for a wide range of applications. A BSP computer consists of several processors, each with private memory, and a communication network that delivers access to remote memory in uniform time. Numerical linear algebra computations can bene t from the BSP model, both in terms of simplicity and eciency. Dense LU decomposition and other computations can be made more ecient by using the new technique of two-phase randomised broadcasting, which is motivated by a cost analysis in the BSP model. For LU decomposition with partial pivoting, this technique reduces the communication time by a factor of (p p + 1)=3, where p is the number of processors. Theoretical analysis, ...