In the near future, we will be surrounded by intelligent devices that transform the way we interact with the world. These devices need to acquire and process data to derive actions and interpretations in order to automate/monitor many tasks without human intervention. Such tasks require the implementation of complex machine learning algorithms on these devices. Deep neural networks (DNNs) have evolved into the state-of-the-art approach for machine learning tasks. However, realizing computationally intensive machine learning (ML) algorithms such as DNNs under stringent constraints on energy, latency, and form-factor is a formidable challenge. In conventional von Neumann architectures, the energy and latency cost of realizing ML algorithms ...