Injecting weight noise during training is a simple technique that has been proposed for almost two decades. However, little is known about its convergence behavior. This paper studies the convergence of two weight noise injectionbased training algorithms, multiplicative weight noise injection with weight decay and additive weight noise injection with weight decay. We consider that they are applied to multilayer perceptrons either with linear or sigmoid output nodes. Let w(t ) be the weight vector, let V(w) be the corresponding objective function of the training algorithm, let α > 0 be the weight decay constant, and let μ(t ) be the step size. We show that if μ(t ) → 0, then with probability one E[ w(t ) 22 ] is bound and limt→∞ w(t ) 2 exis...