This thesis aims at developing efficient optimization algorithms for solving large-scale machine learning problems. To cope with the increasing scale and complexity of such models, we focus on first-order and stochastic methods in which updates are carried out using only (noisy) information about function values and (sub)gradients. The broad question we ask is: given such algorithmic oracles, how can we best design algorithms that both have strong theoretical guarantees and are practically useful. To this end, we touch upon a wide range of problems and investigate various methods (including newly proposed and successful existing ones) for solving them. We pay significant attention to developing and analyzing methods that are easy to run at ...