As a result of advancement in deep learning and neural network technology, end-to-end models have been introduced into automatic speech recognition (ASR) successfully and achieved superior performance compared to conventional hybrid systems. End-to-end models simplify the traditional GMM-HMM models by transcribing speech to text directly with fast computation speed and fast development time. Transformer model, the latest end-to-end model, has achieved a huge success not only in ASR but also in natural language processing and computer vision. In spite of its great performance, transformer model architecture can be further improved to better suit the characteristics of ASR. To be more specific, ASR performance is greatly affected by the s...