[[abstract]]In this paper, we present a new parallel-in parallel-out systolic array with unidirectional data flow for performing the power-sum operation C+AB2 in finite fields GF(2m). The architecture employs the standard basis representation and can provide the maximum throughput in the sense of producing new results at a rate of one per clock cycle. It is highly regular, modular, and, thus, well-suited to VLSI implementation. As compared to a previous systolic power-sum circuit with bidirectional data flow and the same throughput performance, the proposed one has smaller latency, consumes less chip area, and can more easily incorporate fault-tolerant design. Based on the new power-sum circuit, we also propose a parallel-in parallel-out sy...