A new digital VLSI architecture has been presented for the implementation of discrete-time multilayer CNNs. At functional level, the architecture is organized as 12 layers of $64 \times 64$ cells, which interact as specified by a set of 3-D generalized templates. At structural level, the application of cloning templates occurs in a set of processing units programmed by instruction masks, generated on the basis of the algorithm to be emulated. It is demonstrated that this architecture is applicable to multilayer algorithms for visual processing, and also to standard CNNs, including those that use sequences of templates or that work in parallel. Simulations evidence the high efficiency of this implementation