As deep learning has been adopted in various domains, the inference process is of growing importance to ensure the deployment across multiple computing platforms. Within many deep learning frameworks that support freezing and deploying the well-trained models, NVIDIA TensorRT is the leading framework that is exclusively developed for inference. It allows the developer to optimize the model to facilitate high-performance inference. While it has been shown extensively that TensorRT can significantly boost the inference capability, quantitative study is lacking on how assorted optimization strategies can improve the inference compared to other well-known deep learning frameworks such as TensorFlow. This thesis presents such a study that consist...