Object recognition is one of the most active fields of computer vision. In this thesis we consider two problems: recognition of object categories (a car, a pedestrian) and recognition of object instances (Mr Smith's car, Mr Smith himself). We use local object representations, which means that an image is considered as a set of local regions, which is more robust and more flexible that a global representation. We particularly focus on bag-of-words methods, that discard geometric information between local regions. We study the influence of each step of the algorithm, and show that the parameter the most influent on the accuracy is the amount of local regions sampled to describe the image. We thus propose to sample a large amount of random loc...