We investigate layered neural networks with differentiable activation function and student vectors without normalization constraint by means of equilibrium statistical physics. We consider the learning of perfectly realizable rules and find that the length of student vectors becomes infinite, unless a proper weight decay term is added to the energy. Then, the system undergoes a first-order phase transition between states with very long student vectors and states where the lengths are comparable to those of the teacher vectors. Additionally, in both configurations there is a phase transition between a specialized and an unspecialized phase. An anti-specialized phase with long student vectors exists in networks with a small number of hidden u...