In this thesis, we investigate various optimization problems motivated by applications in modern-day machine learning. In the first part, we look at the computational complexity of training ReLU neural networks. We consider the following problem: given a fully-connected two hidden layer ReLU neural network with two ReLU nodes in the first layer and one ReLU node in the second layer, does there exists weights of the edges such that neural network fits the given data? We show that the problem is NP-hard to answer. The main contribution is the design of the gadget which allows for reducing the Separation by Two Hyperplane problem into ReLU neural network training problem. In the second part of the thesis, we look at the design and complexity a...