Cry is an important signal in early infancy for parents to understand needs of their baby and thereby to provide timely parenting/soothing or to be reassured. Thanks to the recent advancement of signal processing, deep learning, and internet-of-things technologies, smart baby monitors with a microphone and/or a video camera have attracted a lot of attention to be used in a baby room to assist parental activities. In this paper, we propose a two-step approach to detect infant cries automatically with continuous audio signals. We first identify and remove the segments without clear sounds (background noise) using a volume-based thresholding algorithm, followed by convolutional neural network (CNN) models to further detect infant cries. The CN...