Recently the Local-SOVA algorithm was suggested as an alternative to the max-Log MAP algorithm commonly used for decoding Turbo codes. In this work, we introduce new complexity reductions to the Local-SOVA algorithm, which allow an efficient implementation at a marginal BER penalty of 0.05 dB. Furthermore, we present the first hardware architectures for the computational units of the Local-SOVA algorithm, namely for the add-compare select unit and the soft output unit, targeting radix orders 2, 4 and 8.We provide place & route implementation results for 28nm technology and demonstrate an area reduction of 46 75% for the soft output unit for radix orders 4 in comparison with the respective max-Log MAP soft output unit. These area reduction...