Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both SIMD instructions and thread-level parallelism. In this paper, we propose a new parallel sorting algorithm, called Aligned-Access sort (AA-sort), for shared-memory multi processors. The AA-sort algorithm takes advantage of SIMD instruc-tions. The key to high performance is eliminating unaligned memory accesses that would reduce the effectiveness of SIMD instructions. We implemented and evaluated the AA-sort on PowerPC ® 970MP and Cell Broadband EngineTM. In summary, a sequential version of the AA-sort using SIMD instructions outperformed IBM’s optimized sequential sorting library by 1.8 times and GPUTeraSort using SIM...