The generalization performance of kernel methods is largely determined by the kernel, but spectral representations of stationary kernels are both input-independent and output-independent, which limits their applications on complicated tasks. In this paper, we propose an efficient learning framework that incorporates the process of finding suitable kernels and model training. Using non-stationary spectral kernels and backpropagation w.r.t. the objective, we obtain favorable spectral representations that depends on both inputs and outputs. Further, based on Rademacher complexity, we derive data-dependent generalization error bounds, where we investigate the effect of those factors and introduce regularization terms to improve the performance....